LocARNA-1.8.11
Classes | Public Types | Public Member Functions | Static Public Member Functions | Protected Member Functions | List of all members
LocARNA::MultipleAlignment Class Reference

Represents a multiple alignment. More...

#include <multiple_alignment.hh>

Inheritance diagram for LocARNA::MultipleAlignment:
Inheritance graph
[legend]

Classes

class  AliColumn
 read only proxy class representing a column of the alignment More...
 
struct  AnnoType
 type of sequence annotation. enumerates legal annotation types More...
 
struct  FormatType
 file format type for multiple alignments More...
 
class  SeqEntry
 A row in a multiple alignment. More...
 

Public Types

typedef size_t size_type
 size type
 
typedef std::vector< SeqEntry >::const_iterator const_iterator
 const iterator of sequence entries
 

Public Member Functions

 MultipleAlignment ()
 Construct empty.
 
 MultipleAlignment (const std::string &file, FormatType::type format=FormatType::CLUSTAL)
 Construct from file. More...
 
 MultipleAlignment (std::istream &in, FormatType::type format=FormatType::CLUSTAL)
 Construct from stream. More...
 
 MultipleAlignment (const std::string &name, const std::string &sequence)
 Construct as degenerate alignment of one sequence. More...
 
 MultipleAlignment (const std::string &nameA, const std::string &nameB, const std::string &alistringA, const std::string &alistringB)
 Construct as pairwise alignment from names and alignment strings. More...
 
 MultipleAlignment (const Alignment &alignment, bool only_local=false, bool special_gap_symbols=false)
 Construct from Alignment object. More...
 
 MultipleAlignment (const AlignmentEdges &edges, const Sequence &seqA, const Sequence &seqB)
 Construct from alignment edges and sequences. More...
 
virtual ~MultipleAlignment ()
 virtual destructor
 
const Sequenceas_sequence () const
 "cast" multiple alignment to sequence More...
 
void normalize_rna_symbols ()
 normalize rna symbols More...
 
size_type num_of_rows () const
 Number of rows of multiple aligment. More...
 
bool empty () const
 Emptiness check. More...
 
const SequenceAnnotationannotation (const AnnoType::type &annotype) const
 Read access of annotation by prefix. More...
 
void set_annotation (const AnnoType::type &annotype, const SequenceAnnotation &annotation)
 Write access to annotation. More...
 
bool has_annotation (const AnnoType::type &annotype) const
 
bool is_proper () const
 Test whether alignment is proper. More...
 
pos_type length () const
 Length of multiple aligment. More...
 
const_iterator begin () const
 Begin for read-only traversal of name/sequence pairs. More...
 
const_iterator end () const
 End for read-only traversal of name/sequence pairs. More...
 
bool contains (std::string name) const
 Test whether name exists. More...
 
size_type index (const std::string &name) const
 Access index by name. More...
 
const SeqEntryseqentry (size_type index) const
 Access name/sequence pair by index. More...
 
const SeqEntryseqentry (const std::string &name) const
 Access name/sequence pair by name. More...
 
size_type deviation (const MultipleAlignment &ma) const
 Deviation of a multiple alignment from a reference alignment. More...
 
double sps (const MultipleAlignment &ma, bool compalign=true) const
 Sum-of-pairs score between a multiple alignment and a reference alignment. More...
 
double cmfinder_realignment_score (const MultipleAlignment &ma) const
 Cmfinder realignment score of a multiple alignment to a reference alignment. More...
 
double avg_deviation_score (const MultipleAlignment &ma) const
 Average deviation score. More...
 
std::string consensus_sequence () const
 Consensus sequence of multiple alignment. More...
 
AliColumn column (size_type col_index) const
 Access alignment column. More...
 
void append (const SeqEntry &seqentry)
 Append sequence entry. More...
 
void prepend (const SeqEntry &seqentry)
 Prepend sequence entry. More...
 
void operator+= (const AliColumn &c)
 Append a column. More...
 
void operator+= (char c)
 Append the same character to each row. More...
 
void reverse ()
 reverse the multiple alignment
 
std::ostream & write (std::ostream &out, FormatType::type format=MultipleAlignment::FormatType::CLUSTAL) const
 Write alignment to stream. More...
 
std::ostream & write (std::ostream &out, size_t width, FormatType::type format=MultipleAlignment::FormatType::CLUSTAL) const
 Write alignment to stream (wrapped) More...
 
std::ostream & write_name_sequence_line (std::ostream &out, const std::string &name, const std::string &sequence, size_t namewidth) const
 Write formatted line of name and sequence. More...
 
std::ostream & write (std::ostream &out, size_type start, size_type end, FormatType::type format=MultipleAlignment::FormatType::CLUSTAL) const
 Write sub-alignment to stream. More...
 
bool checkAlphabet (const Alphabet< char > &alphabet) const
 check character constraints More...
 
void write_debug (std::ostream &out=std::cout) const
 Print contents of object to stream. More...
 

Static Public Member Functions

static size_t num_of_annotypes ()
 number of annotation types More...
 

Protected Member Functions

void init (const AlignmentEdges &edges, const Sequence &seqA, const Sequence &seqB, bool special_gap_symbols)
 Initialize from alignment edges and sequences. More...
 

Detailed Description

Represents a multiple alignment.

The multiple alignment is implemented as vector of name/sequence pairs.

Supports traversal of name/sequence pairs. The sequence entries support mapping from columns to positions and back.

Names are unique in a multiple alignment object.

Todo:
constructors ensure name uniqueness; if constructing as alignment of alignments, names are made unique if needed.

Sequences positions and column indices are 1..len.

MultipleAlignment can have anchor and structure annotation and can read and write them.

Note
this class is agnostic of the type of sequences in the alignment; it does not check for 'allowed characters' nor transform characters. However, normalize_rna_bases() is provided to perform a normalization in the case of RNAs, which is generally assumed by the alignment engines and Vienna folding routines.
Todo:
because this class does not know whether it contains RNA, it would be useful to have a derived class RNAMultipleAlignment. This class could guarantee that its sequences are normalized RNA sequences. Consequently, we could enforce by the type system that RnaEnsemble is generated only from RNAMultipleAlignment etc.
Todo:
automatically adapt output width for sequence names to longest name

Constructor & Destructor Documentation

LocARNA::MultipleAlignment::MultipleAlignment ( const std::string &  file,
FormatType::type  format = FormatType::CLUSTAL 
)

Construct from file.

Parameters
filename of input file
formatfile format (
See also
FormatType)
Exceptions
failureon read problems
See also
MultipleAlignment(std::istream &in)
LocARNA::MultipleAlignment::MultipleAlignment ( std::istream &  in,
FormatType::type  format = FormatType::CLUSTAL 
)

Construct from stream.

Parameters
ininput stream with alignment in clustalW-like format
formatfile format (
See also
FormatType)
Exceptions
failureon read errors
LocARNA::MultipleAlignment::MultipleAlignment ( const std::string &  name,
const std::string &  sequence 
)

Construct as degenerate alignment of one sequence.

Parameters
namename of sequence
sequencesequence strings
LocARNA::MultipleAlignment::MultipleAlignment ( const std::string &  nameA,
const std::string &  nameB,
const std::string &  alistringA,
const std::string &  alistringB 
)

Construct as pairwise alignment from names and alignment strings.

Parameters
nameAname of sequence A
nameBname of sequence B
alistringAalignment strings of sequence A
alistringBalignment strings of sequence B
Note
handling of gap-symbols: use same gap symbols as in given alistrings
LocARNA::MultipleAlignment::MultipleAlignment ( const Alignment alignment,
bool  only_local = false,
bool  special_gap_symbols = false 
)

Construct from Alignment object.

Parameters
alignmentobject of type Alignment
only_localif true, construct only local alignment
special_gap_symbolsif true, use special distinct gap symbols for gaps due to loop deletion '_' or sparsification '~'
Note
Automatically computes a consensus anchor string if anchors are available. Consensus anchors containing duplicate names are cleared. Does not compute some kind of consensus structure, even if structure annotation of sequences A and B in Alignment is available.
LocARNA::MultipleAlignment::MultipleAlignment ( const AlignmentEdges edges,
const Sequence seqA,
const Sequence seqB 
)

Construct from alignment edges and sequences.

Parameters
edgesalignment edges
seqAsequence A
seqBsequence B
Note
Automatically computes a consensus anchor string if anchors are available. Consensus anchors containing duplicate names are cleared. Does not compute some kind of consensus structure, even if structure annotation of sequences A and B is available.

Member Function Documentation

const SequenceAnnotation & LocARNA::MultipleAlignment::annotation ( const AnnoType::type annotype) const

Read access of annotation by prefix.

Parameters
typeof annotation
Returns
sequence annotation
Note
returns ref to empty annotation if annotation is not available
void LocARNA::MultipleAlignment::append ( const SeqEntry seqentry)

Append sequence entry.

Parameters
seqentrynew sequence entry
Precondition
*this is empty or entry must have same size as *this
const Sequence & LocARNA::MultipleAlignment::as_sequence ( ) const

"cast" multiple alignment to sequence

Note
this works like an upcast; this is ok, as long as sequence does not specify attributes
double LocARNA::MultipleAlignment::avg_deviation_score ( const MultipleAlignment ma) const

Average deviation score.

Parameters
mamultiple alignment
Returns
average deviation fo alignment ma to reference alignment *this
Precondition
the sequences of ma have to occur in the alignment *this
Note
this is not the same as deviation (and may be even not very similar)!
const_iterator LocARNA::MultipleAlignment::begin ( ) const
inline

Begin for read-only traversal of name/sequence pairs.

Returns
begin iterator
bool LocARNA::MultipleAlignment::checkAlphabet ( const Alphabet< char > &  alphabet) const

check character constraints

Check whether the alignment contains characters from the given alphabet only and, if warn, print warnings otherwise.

Parameters
alphabetalphabet of admissible characters
Returns
whether all characters are in the alphabet
double LocARNA::MultipleAlignment::cmfinder_realignment_score ( const MultipleAlignment ma) const

Cmfinder realignment score of a multiple alignment to a reference alignment.

Parameters
mamultiple alignment
Returns
cmfinder realignment score of ma to reference alignment *this
Note
this score was defined in Elfar Torarinsson, Zizhen Yao, Eric D. Wiklund, et al. Comparative genomics beyond sequence-based alignments: RNA structures in the ENCODE regions. Genome Res. 2008 (Section Realignment calculation)
Precondition
the sequences of ma have to occur in the alignment *this
AliColumn LocARNA::MultipleAlignment::column ( size_type  col_index) const
inline

Access alignment column.

Parameters
col_indexcolumn index
Returns
reference to alignment column with index i (1-based)
std::string LocARNA::MultipleAlignment::consensus_sequence ( ) const

Consensus sequence of multiple alignment.

Consensus sequence by simple majority in each column. Assume that only ascii < 127 characters occur

Returns
consensus sequence as string
bool LocARNA::MultipleAlignment::contains ( std::string  name) const

Test whether name exists.

Parameters
namename of a sequence
Returns
whether sequence with given name exists in multiple alignment
size_type LocARNA::MultipleAlignment::deviation ( const MultipleAlignment ma) const

Deviation of a multiple alignment from a reference alignment.

Parameters
mamultiple alignment
Returns
deviation of ma from reference alignment *this deviation is defined for realignment in limited deviation from a reference alignment as preformed when –max-diff-aln is given with –max-diff to locarna.
Precondition
the sequences of ma have to occur in the alignment *this
bool LocARNA::MultipleAlignment::empty ( ) const
inline

Emptiness check.

Returns
whether the object contains no sequences
Note
an alignment containing one or more empty sequences is not empty in this sense.
const_iterator LocARNA::MultipleAlignment::end ( ) const
inline

End for read-only traversal of name/sequence pairs.

Returns
end iterator
bool LocARNA::MultipleAlignment::has_annotation ( const AnnoType::type annotype) const
inline

Annotation availability

Parameters
prefixannotation prefix
Returns
wheter annotions with prefix are available
size_type LocARNA::MultipleAlignment::index ( const std::string &  name) const
inline

Access index by name.

Precondition
name exists
Parameters
namename of a sequence
Returns
index of name/sequence pair with given name
void LocARNA::MultipleAlignment::init ( const AlignmentEdges edges,
const Sequence seqA,
const Sequence seqB,
bool  special_gap_symbols 
)
protected

Initialize from alignment edges and sequences.

Parameters
edgesalignment edges
seqAsequence A
seqBsequence B
special_gap_symbolsif true, use special distinct gap symbols for gaps due to loop deletion '_' or sparsification '~'
bool LocARNA::MultipleAlignment::is_proper ( ) const

Test whether alignment is proper.

Returns
whether all sequences have the same length
pos_type LocARNA::MultipleAlignment::length ( ) const
inline

Length of multiple aligment.

Note
Assumes proper alignment. Does not check, whether all sequences have the same length!
Returns
length of first sequence in alignment
void LocARNA::MultipleAlignment::normalize_rna_symbols ( )

normalize rna symbols

See also
normalize_rna_sequence()

Normalize the symbols in all aligned sequences assuming that they code for RNA

static size_t LocARNA::MultipleAlignment::num_of_annotypes ( )
inlinestatic

number of annotation types

Returns
number of annotation types
size_type LocARNA::MultipleAlignment::num_of_rows ( ) const
inline

Number of rows of multiple aligment.

Returns
number of rows
void LocARNA::MultipleAlignment::operator+= ( const AliColumn c)

Append a column.

Parameters
ccolumn that is appended
void LocARNA::MultipleAlignment::operator+= ( char  c)

Append the same character to each row.

Parameters
ccharacter that is appended
void LocARNA::MultipleAlignment::prepend ( const SeqEntry seqentry)

Prepend sequence entry.

Parameters
seqentrynew sequence entry
Precondition
*this is empty or entry must have same size as *this
Note
prepend is a lot more costly then append; it has cost linearly in the number of rows
const SeqEntry& LocARNA::MultipleAlignment::seqentry ( size_type  index) const
inline

Access name/sequence pair by index.

Precondition
index in range 0..size()-1
Parameters
indexindex of name/sequence pair (0-based)
Returns
sequence (including gaps) with given index
const SeqEntry& LocARNA::MultipleAlignment::seqentry ( const std::string &  name) const
inline

Access name/sequence pair by name.

Parameters
namename of name/sequence pair
Returns
sequence (including gaps) with given name
void LocARNA::MultipleAlignment::set_annotation ( const AnnoType::type annotype,
const SequenceAnnotation annotation 
)
inline

Write access to annotation.

Parameters
prefixannotation prefix
annotationsequence annotation
Todo:
check that annotation is valid for multiple alignment; throw failure if annotation is not valid
double LocARNA::MultipleAlignment::sps ( const MultipleAlignment ma,
bool  compalign = true 
) const

Sum-of-pairs score between a multiple alignment and a reference alignment.

Parameters
mamultiple alignment
compalignwhether to compute score like compalign
Returns
sum-of-pairs score of ma from reference alignment *this
Note
Whereas the sps score for compalign==FALSE counts common matches only, the compalign score additionally counts common indels.
Precondition
the sequences of ma have to occur in the alignment *this
std::ostream & LocARNA::MultipleAlignment::write ( std::ostream &  out,
FormatType::type  format = MultipleAlignment::FormatType::CLUSTAL 
) const

Write alignment to stream.

Parameters
outoutput stream
formatalignment format; default: CLUSTAL (
See also
FormatType)
Returns
output stream

Writes one line "<name> <seq>" for each sequence; moereover, writes annotations.

Note
does not write format header
std::ostream & LocARNA::MultipleAlignment::write ( std::ostream &  out,
size_t  width,
FormatType::type  format = MultipleAlignment::FormatType::CLUSTAL 
) const

Write alignment to stream (wrapped)

Parameters
outoutput stream
widthoutput stream
formatalignment format; default: CLUSTAL (
See also
FormatType)
Returns
output stream

Writes lines "<name> <seq>" per sequence, wraps lines at width

Note
: do not write format header
std::ostream & LocARNA::MultipleAlignment::write ( std::ostream &  out,
size_type  start,
size_type  end,
FormatType::type  format = MultipleAlignment::FormatType::CLUSTAL 
) const

Write sub-alignment to stream.

Write from position start to position end to output stream out; write lines "<name> <seq>"

Parameters
outoutput stream
startstart column (1-based)
endend column (1-based)
formatalignment format; default: CLUSTAL (
See also
FormatType)
Returns
output stream
void LocARNA::MultipleAlignment::write_debug ( std::ostream &  out = std::cout) const

Print contents of object to stream.

Parameters
outoutput stream
std::ostream & LocARNA::MultipleAlignment::write_name_sequence_line ( std::ostream &  out,
const std::string &  name,
const std::string &  sequence,
size_t  namewidth 
) const

Write formatted line of name and sequence.

The line is formatted such that it fits the output of the write methods.

Parameters
outoutput stream
namename string
sequencesequence string
Returns
output stream

The documentation for this class was generated from the following files: