LocARNA-1.9.2
 All Classes Files Functions Variables Typedefs Enumerations Enumerator Friends
Classes | Public Types | Public Member Functions | Static Public Member Functions | Protected Member Functions
LocARNA::MultipleAlignment Class Reference

Represents a multiple alignment. More...

#include <multiple_alignment.hh>

Inheritance diagram for LocARNA::MultipleAlignment:
LocARNA::Sequence

List of all members.

Classes

class  AliColumn
 read only proxy class representing a column of the alignment More...
struct  AnnoType
 type of sequence annotation. enumerates legal annotation types More...
struct  FormatType
 file format type for multiple alignments More...
class  SeqEntry
 A row in a multiple alignment. More...

Public Types

typedef size_t size_type
 size type
typedef std::vector< SeqEntry >
::const_iterator 
const_iterator
 const iterator of sequence entries

Public Member Functions

 MultipleAlignment ()
 Construct empty.
 MultipleAlignment (const std::string &file, FormatType::type format=FormatType::CLUSTAL)
 Construct from file.
 MultipleAlignment (std::istream &in, FormatType::type format=FormatType::CLUSTAL)
 Construct from stream.
 MultipleAlignment (const std::string &name, const std::string &sequence)
 Construct as degenerate alignment of one sequence.
 MultipleAlignment (const std::string &nameA, const std::string &nameB, const std::string &alistringA, const std::string &alistringB)
 Construct as pairwise alignment from names and alignment strings.
 MultipleAlignment (const Alignment &alignment, bool only_local=false, bool special_gap_symbols=false)
 Construct from Alignment object.
 MultipleAlignment (const AlignmentEdges &edges, const Sequence &seqA, const Sequence &seqB)
 Construct from alignment edges and sequences.
virtual ~MultipleAlignment ()
 virtual destructor
const Sequenceas_sequence () const
 "cast" multiple alignment to sequence
void normalize_rna_symbols ()
 normalize rna symbols
size_type num_of_rows () const
 Number of rows of multiple aligment.
bool empty () const
 Emptiness check.
const SequenceAnnotationannotation (const AnnoType::type &annotype) const
 Read access of annotation by prefix.
void set_annotation (const AnnoType::type &annotype, const SequenceAnnotation &annotation)
 Write access to annotation.
bool has_annotation (const AnnoType::type &annotype) const
bool is_proper () const
 Test whether alignment is proper.
pos_type length () const
 Length of multiple aligment.
const_iterator begin () const
 Begin for read-only traversal of name/sequence pairs.
const_iterator end () const
 End for read-only traversal of name/sequence pairs.
bool contains (const std::string &name) const
 Test whether name exists.
size_type index (const std::string &name) const
 Access index by name.
const SeqEntryseqentry (size_type index) const
 Access name/sequence pair by index.
const SeqEntryseqentry (const std::string &name) const
 Access name/sequence pair by name.
size_type deviation (const MultipleAlignment &ma) const
 Deviation of a multiple alignment from a reference alignment.
double sps (const MultipleAlignment &ma, bool compalign=true) const
 Sum-of-pairs score between a multiple alignment and a reference alignment.
double cmfinder_realignment_score (const MultipleAlignment &ma) const
 Cmfinder realignment score of a multiple alignment to a reference alignment.
double avg_deviation_score (const MultipleAlignment &ma) const
 Average deviation score.
std::string consensus_sequence () const
 Consensus sequence of multiple alignment.
AliColumn column (size_type col_index) const
 Access alignment column.
void append (const SeqEntry &seqentry)
 Append sequence entry.
void prepend (const SeqEntry &seqentry)
 Prepend sequence entry.
void operator+= (const AliColumn &c)
 Append a column.
void operator+= (char c)
 Append the same character to each row.
void reverse ()
 reverse the multiple alignment
std::ostream & write (std::ostream &out, FormatType::type format=MultipleAlignment::FormatType::CLUSTAL) const
 Write alignment to stream.
std::ostream & write (std::ostream &out, size_t width, FormatType::type format=MultipleAlignment::FormatType::CLUSTAL) const
 Write alignment to stream (wrapped)
std::ostream & write_name_sequence_line (std::ostream &out, const std::string &name, const std::string &sequence, size_t namewidth) const
 Write formatted line of name and sequence.
std::ostream & write (std::ostream &out, size_type start, size_type end, FormatType::type format=MultipleAlignment::FormatType::CLUSTAL) const
 Write sub-alignment to stream.
bool checkAlphabet (const Alphabet< char > &alphabet) const
 check character constraints
void write_debug (std::ostream &out=std::cout) const
 Print contents of object to stream.

Static Public Member Functions

static size_t num_of_annotypes ()
 number of annotation types

Protected Member Functions

void init (const AlignmentEdges &edges, const Sequence &seqA, const Sequence &seqB, bool special_gap_symbols)
 Initialize from alignment edges and sequences.

Detailed Description

Represents a multiple alignment.

The multiple alignment is implemented as vector of name/sequence pairs.

Supports traversal of name/sequence pairs. The sequence entries support mapping from columns to positions and back.

Names are unique in a multiple alignment object.

Todo:
constructors ensure name uniqueness; if constructing as alignment of alignments, names are made unique if needed.

Sequences positions and column indices are 1..len.

MultipleAlignment can have anchor and structure annotation and can read and write them.

Note:
this class is agnostic of the type of sequences in the alignment; it does not check for 'allowed characters' nor transform characters. However, normalize_rna_bases() is provided to perform a normalization in the case of RNAs, which is generally assumed by the alignment engines and Vienna folding routines.
Todo:
because this class does not know whether it contains RNA, it would be useful to have a derived class RNAMultipleAlignment. This class could guarantee that its sequences are normalized RNA sequences. Consequently, we could enforce by the type system that RnaEnsemble is generated only from RNAMultipleAlignment etc.
Todo:
automatically adapt output width for sequence names to longest name

Constructor & Destructor Documentation

Construct from file.

Parameters:
filename of input file
formatfile format (
See also:
FormatType)
Exceptions:
failureon read problems
See also:
MultipleAlignment(std::istream &in)

Construct from stream.

Parameters:
ininput stream with alignment in clustalW-like format
formatfile format (
See also:
FormatType)
Exceptions:
failureon read errors
LocARNA::MultipleAlignment::MultipleAlignment ( const std::string &  name,
const std::string &  sequence 
)

Construct as degenerate alignment of one sequence.

Parameters:
namename of sequence
sequencesequence strings
LocARNA::MultipleAlignment::MultipleAlignment ( const std::string &  nameA,
const std::string &  nameB,
const std::string &  alistringA,
const std::string &  alistringB 
)

Construct as pairwise alignment from names and alignment strings.

Parameters:
nameAname of sequence A
nameBname of sequence B
alistringAalignment strings of sequence A
alistringBalignment strings of sequence B
Note:
handling of gap-symbols: use same gap symbols as in given alistrings
LocARNA::MultipleAlignment::MultipleAlignment ( const Alignment alignment,
bool  only_local = false,
bool  special_gap_symbols = false 
)

Construct from Alignment object.

Parameters:
alignmentobject of type Alignment
only_localif true, construct only local alignment
special_gap_symbolsif true, use special distinct gap symbols for gaps due to loop deletion '_' or sparsification '~'
Note:
Automatically computes a consensus anchor string if anchors are available. Consensus anchors containing duplicate names are cleared. Does not compute some kind of consensus structure, even if structure annotation of sequences A and B in Alignment is available.
LocARNA::MultipleAlignment::MultipleAlignment ( const AlignmentEdges edges,
const Sequence seqA,
const Sequence seqB 
)

Construct from alignment edges and sequences.

Parameters:
edgesalignment edges
seqAsequence A
seqBsequence B
Note:
Automatically computes a consensus anchor string if anchors are available. Consensus anchors containing duplicate names are cleared. Does not compute some kind of consensus structure, even if structure annotation of sequences A and B is available.

Member Function Documentation

Read access of annotation by prefix.

Parameters:
typeof annotation
Returns:
sequence annotation
Note:
returns ref to empty annotation if annotation is not available
void LocARNA::MultipleAlignment::append ( const SeqEntry seqentry)

Append sequence entry.

Parameters:
seqentrynew sequence entry
Precondition:
*this is empty or entry must have same size as *this

"cast" multiple alignment to sequence

Note:
this works like an upcast; this is ok, as long as sequence does not specify attributes

Average deviation score.

Parameters:
mamultiple alignment
Returns:
average deviation fo alignment ma to reference alignment this
Precondition:
the sequences of ma have to occur in the alignment *this
Note:
this is not the same as deviation (and may be even not very similar)!

Begin for read-only traversal of name/sequence pairs.

Returns:
begin iterator
bool LocARNA::MultipleAlignment::checkAlphabet ( const Alphabet< char > &  alphabet) const

check character constraints

Check whether the alignment contains characters from the given alphabet only and, if warn, print warnings otherwise.

Parameters:
alphabetalphabet of admissible characters
Returns:
whether all characters are in the alphabet

Cmfinder realignment score of a multiple alignment to a reference alignment.

Parameters:
mamultiple alignment
Returns:
cmfinder realignment score of ma to reference alignment *this
Note:
this score was defined in Elfar Torarinsson, Zizhen Yao, Eric D. Wiklund, et al. Comparative genomics beyond sequence-based alignments: RNA structures in the ENCODE regions. Genome Res. 2008 (Section Realignment calculation)
Precondition:
the sequences of ma have to occur in the alignment *this

Access alignment column.

Parameters:
col_indexcolumn index
Returns:
reference to alignment column with index i (1-based)

Consensus sequence of multiple alignment.

Consensus sequence by simple majority in each column. Assume that only ascii < 127 characters occur

Returns:
consensus sequence as string
bool LocARNA::MultipleAlignment::contains ( const std::string &  name) const

Test whether name exists.

Parameters:
namename of a sequence
Returns:
whether sequence with given name exists in multiple alignment

Deviation of a multiple alignment from a reference alignment.

Parameters:
mamultiple alignment
Returns:
deviation of ma from reference alignment *this deviation is defined for realignment in limited deviation from a reference alignment as preformed when --max-diff-aln is given with --max-diff to locarna.
Precondition:
the sequences of ma have to occur in the alignment *this
bool LocARNA::MultipleAlignment::empty ( ) const [inline]

Emptiness check.

Returns:
whether the object contains no sequences
Note:
an alignment containing one or more empty sequences is not empty in this sense.

End for read-only traversal of name/sequence pairs.

Returns:
end iterator
bool LocARNA::MultipleAlignment::has_annotation ( const AnnoType::type annotype) const [inline]

Annotation availability

Parameters:
prefixannotation prefix
Returns:
wheter annotions with prefix are available
size_type LocARNA::MultipleAlignment::index ( const std::string &  name) const [inline]

Access index by name.

Precondition:
name exists
Parameters:
namename of a sequence
Returns:
index of name/sequence pair with given name
void LocARNA::MultipleAlignment::init ( const AlignmentEdges edges,
const Sequence seqA,
const Sequence seqB,
bool  special_gap_symbols 
) [protected]

Initialize from alignment edges and sequences.

Parameters:
edgesalignment edges
seqAsequence A
seqBsequence B
special_gap_symbolsif true, use special distinct gap symbols for gaps due to loop deletion '_' or sparsification '~'

Test whether alignment is proper.

Returns:
whether all sequences have the same length
pos_type LocARNA::MultipleAlignment::length ( ) const [inline]

Length of multiple aligment.

Note:
Assumes proper alignment. Does not check, whether all sequences have the same length!
Returns:
length of first sequence in alignment

normalize rna symbols

See also:
normalize_rna_sequence()

Normalize the symbols in all aligned sequences assuming that they code for RNA

static size_t LocARNA::MultipleAlignment::num_of_annotypes ( ) [inline, static]

number of annotation types

Returns:
number of annotation types

Number of rows of multiple aligment.

Returns:
number of rows
void LocARNA::MultipleAlignment::operator+= ( const AliColumn c)

Append a column.

Parameters:
ccolumn that is appended
void LocARNA::MultipleAlignment::operator+= ( char  c)

Append the same character to each row.

Parameters:
ccharacter that is appended
void LocARNA::MultipleAlignment::prepend ( const SeqEntry seqentry)

Prepend sequence entry.

Parameters:
seqentrynew sequence entry
Precondition:
*this is empty or entry must have same size as *this
Note:
prepend is a lot more costly then append; it has cost linearly in the number of rows
const SeqEntry& LocARNA::MultipleAlignment::seqentry ( size_type  index) const [inline]

Access name/sequence pair by index.

Precondition:
index in range 0..size()-1
Parameters:
indexindex of name/sequence pair (0-based)
Returns:
sequence (including gaps) with given index
const SeqEntry& LocARNA::MultipleAlignment::seqentry ( const std::string &  name) const [inline]

Access name/sequence pair by name.

Parameters:
namename of name/sequence pair
Returns:
sequence (including gaps) with given name
void LocARNA::MultipleAlignment::set_annotation ( const AnnoType::type annotype,
const SequenceAnnotation annotation 
) [inline]

Write access to annotation.

Parameters:
prefixannotation prefix
annotationsequence annotation
Todo:
check that annotation is valid for multiple alignment; throw failure if annotation is not valid
double LocARNA::MultipleAlignment::sps ( const MultipleAlignment ma,
bool  compalign = true 
) const

Sum-of-pairs score between a multiple alignment and a reference alignment.

Parameters:
mamultiple alignment
compalignwhether to compute score like compalign
Returns:
sum-of-pairs score of ma from reference alignment *this
Note:
Whereas the sps score for compalign==FALSE counts common matches only, the compalign score additionally counts common indels.
Precondition:
the sequences of ma have to occur in the alignment *this
std::ostream & LocARNA::MultipleAlignment::write ( std::ostream &  out,
FormatType::type  format = MultipleAlignment::FormatType::CLUSTAL 
) const

Write alignment to stream.

Parameters:
outoutput stream
formatalignment format; only CLUSTAL or STOCKHOLM; default: CLUSTAL (
See also:
FormatType)
Returns:
output stream

Writes one line "<name> <seq>" for each sequence; moereover, writes annotations.

Note:
does not write format header
std::ostream & LocARNA::MultipleAlignment::write ( std::ostream &  out,
size_t  width,
FormatType::type  format = MultipleAlignment::FormatType::CLUSTAL 
) const

Write alignment to stream (wrapped)

Parameters:
outoutput stream
widthoutput stream
formatalignment format; only CLUSTAL or STOCKHOLM; default: CLUSTAL (
See also:
FormatType)
Returns:
output stream

Writes lines "<name> <seq>" per sequence, wraps lines at width

Note:
: do not write format header
std::ostream & LocARNA::MultipleAlignment::write ( std::ostream &  out,
size_type  start,
size_type  end,
FormatType::type  format = MultipleAlignment::FormatType::CLUSTAL 
) const

Write sub-alignment to stream.

Write from position start to position end to output stream out; write lines "<name> <seq>"

Parameters:
outoutput stream
startstart column (1-based)
endend column (1-based)
formatalignment format; default: CLUSTAL (
See also:
FormatType)
Returns:
output stream
void LocARNA::MultipleAlignment::write_debug ( std::ostream &  out = std::cout) const

Print contents of object to stream.

Parameters:
outoutput stream
std::ostream & LocARNA::MultipleAlignment::write_name_sequence_line ( std::ostream &  out,
const std::string &  name,
const std::string &  sequence,
size_t  namewidth 
) const

Write formatted line of name and sequence.

The line is formatted such that it fits the output of the write methods.

Parameters:
outoutput stream
namename string
sequencesequence string
Returns:
output stream

The documentation for this class was generated from the following files:
 All Classes Files Functions Variables Typedefs Enumerations Enumerator Friends