RNAscClust: Clustering RNA sequences using structure conservation and graph based motifs

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

RNAscClust : Clustering RNA sequences using structure conservation and graph based motifs. / Miladi, Milad; Junge, Alexander; Costa, Fabrizio; Seemann, Stefan E.; Havgaard, Jakob Hull; Gorodkin, Jan; Backofen, Rolf.

In: Bioinformatics, Vol. 33, No. 14, 2017, p. 2089-2096.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Miladi, M, Junge, A, Costa, F, Seemann, SE, Havgaard, JH, Gorodkin, J & Backofen, R 2017, 'RNAscClust: Clustering RNA sequences using structure conservation and graph based motifs', Bioinformatics, vol. 33, no. 14, pp. 2089-2096. https://doi.org/10.1093/bioinformatics/btx114

APA

Miladi, M., Junge, A., Costa, F., Seemann, S. E., Havgaard, J. H., Gorodkin, J., & Backofen, R. (2017). RNAscClust: Clustering RNA sequences using structure conservation and graph based motifs. Bioinformatics, 33(14), 2089-2096. https://doi.org/10.1093/bioinformatics/btx114

Vancouver

Miladi M, Junge A, Costa F, Seemann SE, Havgaard JH, Gorodkin J et al. RNAscClust: Clustering RNA sequences using structure conservation and graph based motifs. Bioinformatics. 2017;33(14):2089-2096. https://doi.org/10.1093/bioinformatics/btx114

Author

Miladi, Milad ; Junge, Alexander ; Costa, Fabrizio ; Seemann, Stefan E. ; Havgaard, Jakob Hull ; Gorodkin, Jan ; Backofen, Rolf. / RNAscClust : Clustering RNA sequences using structure conservation and graph based motifs. In: Bioinformatics. 2017 ; Vol. 33, No. 14. pp. 2089-2096.

Bibtex

@article{ab6c1b88e2294502ab70d55d0400b727,
title = "RNAscClust: Clustering RNA sequences using structure conservation and graph based motifs",
abstract = "Motivation: Clustering RNA sequences with common secondary structure is an essential step towards studying RNA function. Whereas structural RNA alignment strategies typically identify common structure for orthologous structured RNAs, clustering seeks to group paralogous RNAs based on structural similarities. However, existing approaches for clustering paralogous RNAs, do not take the compensatory base pair changes obtained from structure conservation in orthologous sequences into account. Results: Here, we present RNAscClust, the implementation of a new algorithm to cluster a set of structured RNAs taking their respective structural conservation into account. For a set of multiple structural alignments of RNA sequences, each containing a paralog sequence included in a structural alignment of its orthologs, RNAscClust computes minimum free-energy structures for each sequence using conserved base pairs as prior information for the folding. The paralogs are then clustered using a graph kernel-based strategy, which identifies common structural features. We show that the clustering accuracy clearly benefits from an increasing degree of compensatory base pair changes in the alignments.",
author = "Milad Miladi and Alexander Junge and Fabrizio Costa and Seemann, {Stefan E.} and Havgaard, {Jakob Hull} and Jan Gorodkin and Rolf Backofen",
year = "2017",
doi = "10.1093/bioinformatics/btx114",
language = "English",
volume = "33",
pages = "2089--2096",
journal = "Computer Applications in the Biosciences",
issn = "1471-2105",
publisher = "Oxford University Press",
number = "14",

}

RIS

TY - JOUR

T1 - RNAscClust

T2 - Clustering RNA sequences using structure conservation and graph based motifs

AU - Miladi, Milad

AU - Junge, Alexander

AU - Costa, Fabrizio

AU - Seemann, Stefan E.

AU - Havgaard, Jakob Hull

AU - Gorodkin, Jan

AU - Backofen, Rolf

PY - 2017

Y1 - 2017

N2 - Motivation: Clustering RNA sequences with common secondary structure is an essential step towards studying RNA function. Whereas structural RNA alignment strategies typically identify common structure for orthologous structured RNAs, clustering seeks to group paralogous RNAs based on structural similarities. However, existing approaches for clustering paralogous RNAs, do not take the compensatory base pair changes obtained from structure conservation in orthologous sequences into account. Results: Here, we present RNAscClust, the implementation of a new algorithm to cluster a set of structured RNAs taking their respective structural conservation into account. For a set of multiple structural alignments of RNA sequences, each containing a paralog sequence included in a structural alignment of its orthologs, RNAscClust computes minimum free-energy structures for each sequence using conserved base pairs as prior information for the folding. The paralogs are then clustered using a graph kernel-based strategy, which identifies common structural features. We show that the clustering accuracy clearly benefits from an increasing degree of compensatory base pair changes in the alignments.

AB - Motivation: Clustering RNA sequences with common secondary structure is an essential step towards studying RNA function. Whereas structural RNA alignment strategies typically identify common structure for orthologous structured RNAs, clustering seeks to group paralogous RNAs based on structural similarities. However, existing approaches for clustering paralogous RNAs, do not take the compensatory base pair changes obtained from structure conservation in orthologous sequences into account. Results: Here, we present RNAscClust, the implementation of a new algorithm to cluster a set of structured RNAs taking their respective structural conservation into account. For a set of multiple structural alignments of RNA sequences, each containing a paralog sequence included in a structural alignment of its orthologs, RNAscClust computes minimum free-energy structures for each sequence using conserved base pairs as prior information for the folding. The paralogs are then clustered using a graph kernel-based strategy, which identifies common structural features. We show that the clustering accuracy clearly benefits from an increasing degree of compensatory base pair changes in the alignments.

U2 - 10.1093/bioinformatics/btx114

DO - 10.1093/bioinformatics/btx114

M3 - Journal article

C2 - 28334186

AN - SCOPUS:85024483430

VL - 33

SP - 2089

EP - 2096

JO - Computer Applications in the Biosciences

JF - Computer Applications in the Biosciences

SN - 1471-2105

IS - 14

ER -

ID: 184387938