PMID- 28334186 OWN - NLM STAT- Publisher DA - 20170323 LR - 20170323 IS - 1367-4811 (Electronic) IS - 1367-4803 (Linking) DP - 2017 Feb 27 TI - RNAscClust: clustering RNA sequences using structure conservation and graph based motifs. LID - 10.1093/bioinformatics/btx114 [doi] AB - Motivation: Clustering RNA sequences with common secondary structure is an essential step towards studying RNA function. Whereas structural RNA alignment strategies typically identify common structure for orthologous structured RNAs, clustering seeks to group paralogous RNAs based on structural similarities. However, existing approaches for clustering paralogous RNAs, do not take the compensatory base pair changes obtained from structure conservation in orthologous sequences into account. Results: Here, we present RNAscClust, the implementation of a new algorithm to cluster a set of structured RNAs taking their respective structural conservation into account. For a set of multiple structural alignments of RNA sequences, each containing a paralog sequence included in a structural alignment of its orthologs, RNAscClust computes minimum free-energy structures for each sequence using conserved base pairs as prior information for the folding. The paralogs are then clustered using a graph kernel-based strategy, which identifies common structural features.We show that the clustering accuracy clearly benefits from an increasing degree of compensatory base pair changes in the alignments. Availability: RNAscClust is available at http://www.bioinf.uni-freiburg.de/Software/RNAscClust. Contact: gorodkin@rth.dk or backofen@informatik.uni-freiburg.de. Supplementary information: Supplementary data are available at Bioinformatics online. FAU - Miladi, Milad AU - Miladi M AD - Bioinformatics Group, Department of Computer Science, University of Freiburg, Germany. FAU - Junge, Alexander AU - Junge A AD - Center for non-coding RNA in Technology and Health, University of Copenhagen, Denmark. AD - Department of Veterinary Clinical and Animal Sciences, University of Copenhagen, Denmark. FAU - Costa, Fabrizio AU - Costa F AD - Bioinformatics Group, Department of Computer Science, University of Freiburg, Germany. FAU - Seemann, Stefan E AU - Seemann SE AD - Center for non-coding RNA in Technology and Health, University of Copenhagen, Denmark. AD - Department of Veterinary Clinical and Animal Sciences, University of Copenhagen, Denmark. FAU - Hull Havgaard, Jakob AU - Hull Havgaard J AD - Center for non-coding RNA in Technology and Health, University of Copenhagen, Denmark. AD - Department of Veterinary Clinical and Animal Sciences, University of Copenhagen, Denmark. FAU - Gorodkin, Jan AU - Gorodkin J AD - Center for non-coding RNA in Technology and Health, University of Copenhagen, Denmark. AD - Department of Veterinary Clinical and Animal Sciences, University of Copenhagen, Denmark. FAU - Backofen, Rolf AU - Backofen R AD - Bioinformatics Group, Department of Computer Science, University of Freiburg, Germany. AD - Center for non-coding RNA in Technology and Health, University of Copenhagen, Denmark. AD - Center for Biological Signalling Studies (BIOSS), University of Freiburg, Germany. LA - eng PT - Journal Article DEP - 20170227 PL - England TA - Bioinformatics JT - Bioinformatics (Oxford, England) JID - 9808944 EDAT- 2017/03/24 06:00 MHDA- 2017/03/24 06:00 CRDT- 2017/03/24 06:00 PHST- 2016/06/06 [received] PHST- 2017/02/21 [accepted] AID - 3056002 [pii] AID - 10.1093/bioinformatics/btx114 [doi] PST - aheadofprint SO - Bioinformatics. 2017 Feb 27. doi: 10.1093/bioinformatics/btx114.