The LocARNA package comprises several tools for producing fast and high-quality pairwise and multiple alignment of RNA sequences of unknown structure. These tools build on the Turner free energy model of RNAs to simultaneously fold and align (or match) RNAs based on their sequence and structure features.
The tools come with many practically relevant features like support of anchor and structure constraints, support of local folding, and effective heuristics. The package comprises the following tools
Some other closely related tools, which however are not distributed with LocARNA, may be of interest:
LocARNA and LocARNA-P are integrated in the Freiburg RNA Tools web server.
Older releases (please use the newest release unless for very good reasons):
Please find further related publications in the publication list of the group.
Whats new in and around the package?
SPARSE improves over the original locarna algorithm in terms of speed. Moreover, it implements and advanced lightweight simultaneous alignment and folding model, which improves its structure prediction capabilities. Currently, the tool offers a trade-off between alignment accuracy and speed. Thus, the choice of either algorithm should be based on the specific application requirements. SPARSE-specific code is contributed by Milad Miladi.
ExpaRNA-P enumerates exactly matching local sequence-structure patterns in RNAs of unknown structure, supporting full structural flexibility according to RNA secondary structure energy models (inheriting from the Vienna RNA package.) Based on ExpaRNA-P's exact matching, the tool ExpLoc-P performs very fast simultaneous alignment and folding of RNAs (think: "like LocARNA, but faster"). For highly efficient prediction, ExpaRNA-P introduces novel ensemble based sparsification techniques, which are a well used by SPARSE. ExpaRNA-P-specific code and the classes for the strong ensemble-based sparsification of ExpARNA-P and SPARSE are contributed by Christina Otto (nee Schmiedl).
REAPR applies LocARNA for structure-based alignment of whole genomes to predict structural non-coding RNAs. With REAPR, we introduced a new realignment mode to LocARNA. In this mode, LocARNA aligns very fast within a small distance to a reference multiple alignment (mlocarna options --max-diff-aln and --max-diff).
LocARNA runs on recent GNU/Linux systems. Installation follows the usual "autotools" scheme (configure/make).
tar xzf locarna-xxx.tar.gz
LocARNA requires the Vienna RNA Package (>=2.1.1).
If the Vienna package is not installed in a standard path, please add configure with option --with-vrna="path to VRNA installation" or after setting the environment variable PKG_CONFIG_PATH to the pkg-config directory containing RNAlib2.pc (prefered).
See ./configure --help for further options.
Some systems require adding the path to LocARNA's library to the environment variable LD_LIBRARY_PATH. If libLocARNA is not found by the tools, extend the search path by
(in bash; assuming locarna is installed in the default path /usr/local/lib.)
There are two major uses of the tools, pairwise and multiple alignment and clustering of RNAs. The work horse of the tool package is the program locarna. However, we recommend the use of our high level scripts mlocarna, locarnate, and RNAclust.pl.
Assume an input file rnas.fa in fasta format, containing several RNA sequences.
For computing a multiple alignment of these RNAs call
In its default settings, mlocarna will produce a global multiple alignment of your RNAs. The program writes some output to the screen as well as output to a directory rnas.out, where the name is derived from the input name by default. The output directory can be controlled by the option --tgtdir. Help on the many options to mlocarna is available by mlocarna --help or more conveniently mlocarna --man. The distribution contains some example input in sub-directory Examples.
Anchor and Structure Constraints. The tool mlocarna provides a convenient interface for user-specified constraints on the alignment, including anchor constraints as well as structure constraints. Constraints are specified in the fasta file as follows:
>fruA CCUCGAGGGGAACCCGAAAGGGACCCGAGAGG .......(((..(((xxxx))).)))...... #S .........AAAAAA.BBBCCCC......... #1 .........123456.1231234......... #2 >fdhA CGCCACCCUGCGAACCCAAUAUAAAAUAAUACAAGGGAGCAGGUGGCG ..............(((.....xxxxxx......)))........... #S ...........AAAAAA.....BBB.........CCCC.......... #1 ...........123456.....123.........1234.......... #2
The structure constraints (lines #S) inherit their semantics from RNAfold. In consequence, the alignment can only be guided by base pairs matches that are compatible to the given constraints. The anchor constraints are specified by giving unique names to certain sequence positions, here A1,A2,A3,A4,A5,A6,B1,B2,B3,C1,C2,C3,C4 (lines #1,#2). Positions of the same name in different sequences are aligned. In each sequence, names have to be unique.
A second, slightly larger example of constraints is provided in Examples/haca.snoRNA.fa of the LocARNA package.
Assume again an input file rnas.fa in fasta format, containing several RNA sequences.
For computing a local multiple alignment by locarnate call
In its default settings, locarnate will produce a structure and sequence local multiple alignment of your RNAs. For most cases, we recommend to turn off both types of locality to get a global multiple aligment
locarnate --no-struc --no-seq rnas.fa
The program writes its output to a directory results in a subdirectory where the name is derived from the input name. The output directory can be controlled by the option --results. Help on the options to locarnate is available by locarnate --help. The distribution contains some example input in sub-directory Examples.
Please jump to the RNAclust section.
The pairwise alignment tool is called with two input files that specify the input sequences and optionally ensemble probabilities (as e.g.\ generated by RNAfold -p or). It accepts different file formats, which can be mixed freely. Available input formats are listed in order of increasing expressivity.
Further help is to mlocarna, locarnate and locarna is available via
mlocarna --help locarna --help locarnate --help
man mlocarna man locarna
LocARNA implements an C++ API to its various algorithms and data structures. The library is installed together with the package, is used by the LocARNA programs themselves, and can be linked as shared library to other programs.
API HTML Documentation. This documentation can be generated by doxygen from the package sources by make doxygen-doc.
In probabilistic mode (mlocarna option --probabilistic), LocARNA computes more accurate multiple alignments based on a probabilistic consistency transformation and reliability profiles for assessing local alignment quality and localizing RNA motifs. These features are based on computing sequence and structure match probabilities due to the LocARNA alignment model.
RNAclust is a tool for clustering of RNAs, which builds on LocARNA. RNAclust is written and copyrighted by Kristin Reiche. It replaces the cluster pipeline that was used for our paper "Inferring non-coding RNA families and classes by means of genome-scale structure-based clustering".
RNAclust.pl --fasta your_sequences --dir output_directory
The full documentation of RNAclust.pl is available as PDF.
An alternative way to construct multiple alignments using locarna. While mlocarna implements various types of progressive and iterative aligmnent, where sequence-structure alignment is performed in each single step, LocARNATE employs T-Coffee for combining pairwise locarna alignments into a multiple alignment.
LocARNATE is written and copyrighted by Wolfgang Otto.