@Article{Meyer:Kurtz:Backofen:Struc_fast_index:2011,
  author =	 {Fernando Meyer and Stefan Kurtz and Rolf Backofen
                  and Sebastian Will and Michael Beckstette},
  title =	 {Structator: fast index-based search for {RNA} 
                  sequence-structure patterns},
  journal =	 {BMC Bioinformatics},
  year =	 2011,
  volume =	 12,
  number =	 1,
  pages =	 214,
  user =	 {backofen},
  pmid =	 21619640,
  doi = 	 {10.1186/1471-2105-12-214},
  issn = 	 {1471-2105},
  abstract =	 {ABSTRACT: BACKGROUND: The secondary structure of RNA 
                  molecules is intimately related to their function and often 
                  more conserved than the sequence. Hence, the important task 
                  of searching databases for RNAs requires to match 
                  sequence-structure patterns. Unfortunately, current tools 
                  for this task have, in the best case, a running time that is 
                  only linear in the size of sequence databases. Furthermore, 
                  established index data structures for fast sequence 
                  matching, like suffix trees or arrays, cannot benefit from 
                  the complementarity constraints introduced by the secondary 
                  structure of RNAs. RESULTS: We present a novel method and 
                  readily applicable software for time efficient matching of 
                  RNA sequence-structure patterns in sequence databases. Our 
                  approach is based on affix arrays, a recently introduced 
                  index data structure, preprocessed from the target database. 
                  Affix arrays support bidirectional pattern search, which is 
                  required for efficiently handling the structural constraints 
                  of the pattern. Structural patterns like stem-loops can be 
                  matched inside out, such that the loop region is matched 
                  first and then the pairing bases on the boundaries are 
                  matched consecutively. This allows to exploit base pairing 
                  information for search space reduction and leads to an 
                  expected running time that is sublinear in the size of the 
                  sequence database. The incorporation of a new chaining 
                  approach in the search of RNA sequence-structure patterns 
                  enables the description of molecules folding into complex 
                  secondary structures with multiple ordered patterns. The 
                  chaining approach removes spurious matches from the set of 
                  intermediate results, in particular of patterns with little 
                  specificity. In benchmark experiments on the Rfam database, 
                  our method runs up to two orders of magnitude faster than 
                  previous methods. CONCLUSIONS: The presented method's 
                  sublinear expected running time makes it well suited for RNA 
                  sequence-structure pattern matching in large sequence 
                  databases. RNA molecules containing several stem-loop 
                  substructures can be described by multiple 
                  sequence-structure patterns and their matches are 
                  efficiently handled by a novel chaining method. Beyond our 
                  algorithmic contributions, we provide with Structator a 
                  complete and robust open-source software solution for 
                  index-based search of RNA sequence-structure patterns. The 
                  Structator software is available at 
                  http://www.zbh.uni-hamburg.de/Structator.}
}