LocARNA-1.9.2
 All Classes Files Functions Variables Typedefs Enumerations Enumerator Friends
Public Types | Public Member Functions | Protected Types | Protected Member Functions | Protected Attributes | Friends
LocARNA::RnaData Class Reference

represent sparsified data of RNA ensemble More...

#include <rna_data.hh>

Inheritance diagram for LocARNA::RnaData:
LocARNA::ExtRnaData

List of all members.

Public Types

typedef SparseMatrix< double > arc_prob_matrix_t
 arc probability matrix
typedef size_t size_type
 usual size type

Public Member Functions

 RnaData (const RnaEnsemble &rna_ensemble, double p_bpcut, double max_bps_length_ratio, const PFoldParams &pfoldparams)
 Construct from RnaEnsemble with cutoff probability.
 RnaData (const std::string &filename, double p_bpcut, double max_bps_length_ratio, const PFoldParams &pfoldparams)
 Construct from file.
 RnaData (const RnaData &rna_dataA, const RnaData &rna_dataB, const Alignment &alignment, double p_expA, double p_expB, bool only_local=false)
 Construct as consensus of two aligned RNAs.
virtual ~RnaData ()
 destructor
const Sequencesequence () const
 Get the multiple alignment as sequence.
const MultipleAlignmentmultiple_alignment () const
 Get the multiple alignment.
size_type length () const
 Get the sequence length.
double arc_cutoff_prob () const
 Get base pair cutoff probability.
double arc_prob (pos_type i, pos_type j) const
 Get arc probability.
std::string mea_structure (double gamma=1.) const
 maximum expected accuracy structure
vrna_plist_t * plist () const
 Construct plist (pair list of Vienna RNA)
double joint_arc_prob (pos_type i, pos_type j) const
 Get arc probability.
double stacked_arc_prob (pos_type i, pos_type j) const
 Get arc probability.
double prob_paired_upstream (pos_type i) const
 Probability that a position is paired upstream.
double prob_paired_downstream (pos_type i) const
 Probability that a position is paired downstream.
double prob_unpaired (pos_type i) const
 Unpaired probability.
std::ostream & write_pp (std::ostream &out, double p_outbpcut=0) const
std::ostream & write_size_info (std::ostream &out) const
 Write object size information.
bool has_stacking () const
 Availability of stacking terms.
void set_anchors (const SequenceAnnotation &anchors)
 Write access to alignment anchors.

Protected Types

typedef
arc_prob_matrix_t::const_iterator 
arc_probs_const_iterator
 type of constant iterator over arcs with probability above cutoff

Protected Member Functions

 RnaData (double p_bpcut, size_t max_bp_span)
 Almost empty constructor.
arc_probs_const_iterator arc_probs_begin () const
 begin of arcs with probability above cutoff Supports iteration over arcs
arc_probs_const_iterator arc_probs_end () const
 begin of arcs with probability above cutoff Supports iteration over arcs
virtual void init_from_fixed_structure (const RnaStructure &structure, const PFoldParams &pfoldparams)
 initialize from fixed structure
virtual void init_from_rna_ensemble (const RnaEnsemble &rna_ensemble, const PFoldParams &pfoldparams)
 initialize from rna ensemble
bool read_autodetect (const std::string &filename, const PFoldParams &pfoldparams)
 read and initialize from file, autodetect format
virtual bool inloopprobs_ok () const
 check in loop probabilities
virtual void read_pp (const std::string &filename)
virtual std::istream & read_pp (std::istream &in)
void read_old_pp (const std::string &filename)
void read_ps (const std::string &filename)

Protected Attributes

RnaDataImplpimpl_

Friends

class RnaDataImpl
class ExtRnaDataImpl

Detailed Description

represent sparsified data of RNA ensemble

knows sequence, cutoff probability and base pair probabilities greater than the cutoff probability; potentially knows stacking probabilities

Note:
This class knows the cutoff probability *of its data*. This cutoff can be different from the cutoff in classes like BasePairs which defines the structure elements that are considered by algorithms.
the class guarantees that sequences are normalized (uppercase, T->U) even when read in unnormalized form, e.g. from file or stream

Constructor & Destructor Documentation

LocARNA::RnaData::RnaData ( const RnaEnsemble rna_ensemble,
double  p_bpcut,
double  max_bps_length_ratio,
const PFoldParams pfoldparams 
)

Construct from RnaEnsemble with cutoff probability.

Parameters:
rna_ensembleRNA ensemble data
p_bpcutcutoff probability
max_bps_length_ratiomax ratio of bps to length (0=no effect)
pfoldparamsfolding parameters (controls stacking)
Note:
RnaData copies all required data from rna_ensemble and does not keep a reference; if pfoldparams.stacking is true, copy stacking terms
LocARNA::RnaData::RnaData ( const std::string &  filename,
double  p_bpcut,
double  max_bps_length_ratio,
const PFoldParams pfoldparams 
)

Construct from file.

Parameters:
filenameinput file name
p_bpcutcutoff probability
pfoldparamsfolding parameters
max_bps_length_ratiomaximal ratio of number of base pairs divided by sequence length. This serves as a second filter on the "significant" base pairs. Value 0 turns this filter off.
Note:
autodetect format of input; for fa or aln input formats, predict base pair probabilities
Todo:
consider to allow reading from istream; use istream::seekg(0) to reset stream to beginning (needed for format autodetect.) Is there a problem due to fail on eofbit in C++98?
Todo:
filter by maxBPspan when reading base pair probabilities from file. currently maxBPspan in pfoldparams is only respected when folding
Note:
If probabilities have to be computed by folding the input sequence(s), folding is subject to the parameters pfoldparams. Also when reading ensemble probabilities from file, the outcome depends on settings in pfoldparams (stacking, max_bp_span). Without stacking, stacking probs are ignored and max_bp_span is used to filter the base pairs by their maximum span.
LocARNA::RnaData::RnaData ( const RnaData rna_dataA,
const RnaData rna_dataB,
const Alignment alignment,
double  p_expA,
double  p_expB,
bool  only_local = false 
)

Construct as consensus of two aligned RNAs.

Parameters:
rna_dataARNA ensemble data A
rna_dataBRNA ensemble data B
alignmentAlignment of A and B
p_expAbackground probability for A
p_expBbackground probability for B
only_localif true, construct only local alignment

The object uses mean cutoff probability of the given objects; The background probability is used in computing consensus probabilities. If both input rna data objects have stacking probabilities, stacking consensus probabilities are computed as well. If the object contain sequence anchors, we construct the new object with a consensus anchor string. (The latter is done as part of the consensus sequence computation.)

LocARNA::RnaData::RnaData ( double  p_bpcut,
size_t  max_bp_span 
) [explicit, protected]

Almost empty constructor.

Parameters:
p_bpcutcutoff probability
max_bp_spanmaximum base pair span

Member Function Documentation

Get base pair cutoff probability.

Returns:
cutoff probability p_bpcut
double LocARNA::RnaData::arc_prob ( pos_type  i,
pos_type  j 
) const

Get arc probability.

Parameters:
ileft sequence position
jright sequence position
Returns:
probability p_ij of basepair (i,j) if p_ij>p_bpcut; otherwise, 0

begin of arcs with probability above cutoff Supports iteration over arcs

Returns:
constant iterator

begin of arcs with probability above cutoff Supports iteration over arcs

Returns:
constant iterator

Availability of stacking terms.

Returns:
whether stacking terms are available
void LocARNA::RnaData::init_from_fixed_structure ( const RnaStructure structure,
const PFoldParams pfoldparams 
) [protected, virtual]

initialize from fixed structure

Parameters:
structurefixed structure
pfoldparamsfolding parameters
  • stacking: whether to initialize stacking terms
Note:
can be overloaded to initialize with additional information (in loop probabilities)

Reimplemented in LocARNA::ExtRnaData.

void LocARNA::RnaData::init_from_rna_ensemble ( const RnaEnsemble rna_ensemble,
const PFoldParams pfoldparams 
) [protected, virtual]

initialize from rna ensemble

Parameters:
rna_ensemblerna ensemble
pfoldparamsfolding parameters
  • stacking: whether to initialize stacking terms
Note:
can be overloaded to initialize with additional information (in loop probabilities)
this method *never* removes lonely or too long base pairs (according to noLP or maxBPspan, resp.)

Reimplemented in LocARNA::ExtRnaData.

virtual bool LocARNA::RnaData::inloopprobs_ok ( ) const [inline, protected, virtual]

check in loop probabilities

Returns:
true iff loop probabilities are available or not required
Note:
use to indicate the need for recomputation in read_autodetect(); always true in RnaData

Reimplemented in LocARNA::ExtRnaData.

double LocARNA::RnaData::joint_arc_prob ( pos_type  i,
pos_type  j 
) const

Get arc probability.

Parameters:
ileft sequence position
jright sequence position
Returns:
joint probability p^(2)_ij of basepair (i,j) and (i+1,j-1) if p_i+1j-1>p_bpcut and p^(2)_ij > p_bpcut; otherwise, 0

Get the sequence length.

Returns:
length of RNA sequence
std::string LocARNA::RnaData::mea_structure ( double  gamma = 1.) const

maximum expected accuracy structure

Parameters:
gamma
Returns:
maximum non-crossing expected accuracy structure

Works as interface to the RNAlib function MEA. From the ViennaRNA docu: Each base pair (i,j) gets a score 2*gamma*p_ij and the score of an unpaired base is given by the probability of not forming a pair (compare RNAfold)

Get the multiple alignment.

Returns:
multiple alignment
vrna_plist_t * LocARNA::RnaData::plist ( ) const

Construct plist (pair list of Vienna RNA)

Note:
the plist has to be deleted by the caller
Returns:
pointer to plist
double LocARNA::RnaData::prob_paired_downstream ( pos_type  i) const

Probability that a position is paired downstream.

Parameters:
isequence position
Returns:
probability that a position i is paired with a position j<i (downstream)
Note:
O(sequence.length()) implementation
See also:
prob_paired_upstream
double LocARNA::RnaData::prob_paired_upstream ( pos_type  i) const

Probability that a position is paired upstream.

Parameters:
isequence position
Returns:
probability that a position i is paired with a position j>i (upstream)
Note:
O(sequence.length()) implementation
See also:
prob_paired_downstream
double LocARNA::RnaData::prob_unpaired ( pos_type  i) const

Unpaired probability.

Parameters:
isequence position
Returns:
probability that a position i is unpaired
Note:
O(sequence.length()) implementation
bool LocARNA::RnaData::read_autodetect ( const std::string &  filename,
const PFoldParams pfoldparams 
) [protected]

read and initialize from file, autodetect format

Parameters:
filenamename of input file
pfoldparamsfolding parameters
  • stacking: whether to initialize stacking terms
  • max_bp_span: maximum base pair span
Returns:
whether probabilities were read completely
Note:
: this method is designed such that it can be used for RnaData and ExtRnaData
the method delegates actual reading to methods read_pp(), read_old_pp(), read_ps(), and the MultipleAlignment class.
when reading in, base pairs exceeding max_bp_span_ or structure information below the probability thresholds are ignored (which is -in part- job of the delegates).
void LocARNA::RnaData::read_old_pp ( const std::string &  filename) [protected]

Read data in the old pp format

Parameters:
filenamename of input file

Reads only base pairs with probabilities greater than p_bpcut_; reads stacking probabilities only if has_stacking_ is true

Note:
the old pp format starts with the sequence/alignment and then simply lists the arcs (i,j) with their probabilities p and optionally stacking probabilities p2. pp-files contain entries i j p [p2]. p denotes the probabilitiy of base pair (i,j). The optional stacking probability p2 is the joint probability of base pairs (i,j) and (i+1,j+1).
handling of stacking: after the call, has_stacking_ is true only if the file specified at least one stacking probability and has_stacking_ was true before.
Todo:
move implementation to impl class
void LocARNA::RnaData::read_pp ( const std::string &  filename) [protected, virtual]

Read data in pp format 2.0

Parameters:
filenamename of input file

Reads only base pairs with probabilities greater than p_bpcut_; reads stacking probabilities only if has_stacking_ is true

Note:
can be overloaded to read extension sections
pp is a proprietary format of LocARNA. In its simplest version, it starts with the sequence/alignment and then simply lists the arcs (i,j) with their probabilities p and optionally stacking probabilities p2. pp-files contain entries i j p [p2]. p denotes the probabilitiy of base pair (i,j). The optional stacking probability p2 is the joint probability of base pairs (i,j) and (i+1,j+1).
handling of stacking: after the call, has_stacking_ is true only if the file specified the STACK keyword and has_stacking_ was true before.
std::istream & LocARNA::RnaData::read_pp ( std::istream &  in) [protected, virtual]

Read data in pp format 2.0

Parameters:
ininput stream
See also:
read_pp(std::string)

Reimplemented in LocARNA::ExtRnaData.

void LocARNA::RnaData::read_ps ( const std::string &  filename) [protected]

Read data in Vienna's dot plot ps format

Parameters:
filenamename of input file

Reads only base pairs with probabilities greater than p_bpcut_; reads stacking probabilities only if has_stacking_ is true

Note:
reads sequence name from file (instead of guessing from filename!); stacking probabilities are read if available (then, sets has_stacking_ to true)
throws wrong_format exception if not in ps format
Todo:
move implementation to impl class

sequence characters should be upper case, and Ts translated to Us

Get the multiple alignment as sequence.

Returns:
sequence

Write access to alignment anchors.

Parameters:
anchorsalignment anchors
See also:
MultipleAlignment::set_annotation()
double LocARNA::RnaData::stacked_arc_prob ( pos_type  i,
pos_type  j 
) const

Get arc probability.

Parameters:
ileft sequence position
jright sequence position
Returns:
conditional probability p_ij|i+1j-1 of basepair (i,j) under condition of base pair (i+1,j-1) if p_i+1j-1>p_bpcut and p^(2)_ij > p_bpcut; throw exception if p_i+1j-1<=p_bpcut; otherwise, 0
std::ostream & LocARNA::RnaData::write_pp ( std::ostream &  out,
double  p_outbpcut = 0 
) const

Write data in pp format

Parameters:
outoutput stream
p_outbpcutcutoff probability
Returns:
stream

Writes only base pairs with probabilities greater than p_outbpcut

std::ostream & LocARNA::RnaData::write_size_info ( std::ostream &  out) const

Write object size information.

Parameters:
outoutput stream

Writes numbers of stored probabilities to stream

Reimplemented in LocARNA::ExtRnaData.


Member Data Documentation

  • pointer to corresponding implementation object

The documentation for this class was generated from the following files:
 All Classes Files Functions Variables Typedefs Enumerations Enumerator Friends