The talk deals with the process of finding homologous non-coding RNA
molecules of one species within another evolutionary closely related
species. Unlike genes non-coding RNAs fold into a specic secondary
structure which defines its function. The difficulties of aligning
sequence and structure in the same step are elucidated and an algorithm
based on the widely used Sankoff algorithm is developed and implemented
in a tool called "Locarna-Scan".
The functionality of this tool is verified by scanning an artificial
genome for hidden sequences and a comparison of its performance, as well
as its searching sensitivity and specificity against other tools, namely
Infernal and RSearch, will be made. In the last part this work tackles
with a real biology problem. An relatively new short RNA which is
involved is the carbon storage regulatory circuit should be found. This
RNA, named CsrB behaves different from common short RNAs by not binding
directly to the target messenger RNA and thus regulating the gene
expression by blocking access of the ribosom towards the mRNA. Instead
it can be seen as a second level of regulation. It sequesters the CsrA
molecule which normal regulates various genes expressions. Even though
CsrA proteins are known within many bacterial organisms, the CsrB RNA
seems to be absent in the most of them. Therefore it is a matter of
interest to check whether there are maybe more yet unknown structures
similar to CsrB.
The organization of protein structures in protein genotype space is well studied. The same does not hold for protein functions, whose organization is important to understand how novel protein functions can arise through blind evolutionary searches of sequence space. In systems other than proteins, two organizational features of genotype space are known to facilitate phenotypic innovation. The first is that genotypes with the same phenotype form vast and connected genotype networks. The second is that different neighborhoods in this space contain different novel phenotypes. We here characterize the organization of enzymatic functions in protein genotype space, using a data set of more than 30,000 proteins with known structure and function. We show that different neighborhoods of genotype space contain proteins with very different functions. This property both facilitates evolutionary innovation through exploration of a genotype network, and it constrains the evolution of novel phenotypes. The phenotypic diversity of different neighborhoods is caused by the fact that some functions can be carried out by multiple structures. We show that the space of protein functions is not homogeneous, and different genotype neighborhoods tend to contain a different spectrum of functions, whose diversity increases with increasing distance of these neighborhoods in sequence space. Whether a protein with a given function can evolve specific new functions is thus determined by the protein's location in sequence space.
Partitioning biological data objects into groups, such that the objects within the groups share common traits, is a long-standing challenge in computational biology; especially in the light of the tremendously increasing amount of NGS data in genomics and transcriptomics. Recently, we developed and established Transitivity Clustering, a partitioning approach based on Weighted Transitive Graph Projection that utilizes a single similarity threshold as density parameter. It provides integrated access to various functions addressing each step of a typical cluster analysis. To facilitate this, Transitivity Clustering offers three user-friendly interfaces: a powerful stand-alone version, a web interface and a collection of Cytoscape plug-ins. It is online accessible at http://transclust.cebitec.uni-bielefeld.de. In the talk, we will demonstrate how our approach aids at each point of a typical biomedical cluster analysis. In particular, we will illustrate how Transitivity Clustering may be utilized to identify protein families from sequence data, cancer sub-types in gene expression data and, finally, protein complexes in PPI networks.