Bioinformatics
Institute of Computer Science
University Freiburg
de

RNAscClust - clustering RNA sequences using structure conservation and graph based motifs

Synopsis

RNAscClust is a pipeline to cluster a set of structured RNAs taking their respective structural conservation into account. The aim of RNAscClust is to aid the discovery of families and classes of ncRNAs.

The input to RNAscClust is a set of multiple structural alignments of RNA sequences. Each alignment contains an RNA sequence from a species of interest structurally aligned to homologous sequences. RNAscClust computes minimum free-energy structures for each sequence from the species of interest using conserved base pairs as prior information for the folding. The sequences originating from the organism of interest are then clustered using a graph kernel-based strategy, which identifies common structural features.

Download

The source code is available as a tarball.

Latest release: RNAscClust 1.1.1

Previous releases can be found under: Releases/

Installation and Usage

Instructions on installation and usage of the source package can be found in the file README.md included in the downloaded tarball.

RNAscClust Docker Image

RNAscClust is available as a Docker container on Docker Hub. Using the Docker container, one can setup the pipeline in a few minutes, without having to install any of the dependencies by hand. The Docker container enables the user to easily reproduce all Figures and Tables shown in the Results section in in the RNAscClust paper (see reference below under Publication) by executing a short sequence of command line instructions. The RNAscClust Docker container supports multi-core execution.

Following the instructions in the Installation section, the pipeline can be installed on your machine preferably on a computing platform supporting the SGE computer cluster system.

Docker acquisition and usage recipe

A Docker client with the necessary user permissions is required for running the Docker image. Then execute:

  docker pull mmiladi/rnascclust:latest
  docker run -it -h dockersgeserver mmiladi/rnascclust:latest

The Docker container will startup with providing examples on how to run the pipeline and evaluations.

Executing the following series of commands allows to reproduce all Figures and Tables shown in the Results section in the RNAscClust paper (see reference below under Publication):

  # Inside a terminal of the host system:
  docker pull mmiladi/rnascclust:latest
  docker run -it -v `pwd`/cluster_evaluation:/cluster_evaluation -h dockersgeserver mmiladi/rnascclust:latest
  # Inside the docker image:
  cd /; bash /rnascclust/bin/clustering/run_clustering_docker.sh >cluster_evaluation/run_clustering_docker.log 2>&1

After execution of the clustering, which takes ~2 hours, the directory cluster_evaluation contains .pdf Figures and .txt Tables following the naming in the manuscript.

Benchmark data sets

Datasets used in the paper for benchmarking can be downloaded from here.

License

The software is available under GNU-GPL3.

Publication

Milad Miladi*, Alexander Junge*, Fabrizio Costa, Stefan E. Seemann, Jakob Hull Havgaard, Jan Gorodkin, and Rolf Backofen. RNAscClust: clustering RNA sequences using structure conservation and graph based motifs (2016). Bioinformatics 33, no. 14 (2017): 2089-2096.

*these authors contributed equally to this work

Contact

RNAscClust is developed by the Chair for Bioinformatics, University of Freiburg and the Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen.

For scientific questions, please contact:

For technical questions, please contact: