Transcriptional Regulation

The expression of most eukaryotic genes is regulated by a complex system of cooperating transcription factors (TF). They are able to bind at particular binding sites (TFBS) which are mostly located upstream of a gene's coding sequence. The cooperation of particular TFs constraints the relative location and orientation of their binding sites. We are developing a modular system, for representing and predicting such regulatory DNA sequences. The system should be able to consider various biological properties of these sequences which are related to their function. The main goal is to improve the prediction of gene regulation processes.

Modelling and Predicting Transcription Factor Binding Sites

Basic units of our modular system are transcription factor binding sites (TBFS), whose properties are modelled stochastically using Bayesian networks. In contrast to commonly used weight matrices (PSSM) these Bayesian networks provide flexibility in representing features of diverse domains and allow modelling conditional dependencies among these features. It is mainly the flexibility of Bayesian networks which was exploited successful in order to improve the prediction performance for TFBS compared to PSSM. Among the considered binding site features, there are sequence-dependent structural parameters of DNA and higher-order sequence properties of TFBS and their neighbouring contexts. For selecting an optimal subset of possible binding site features to construct Bayesian networks, we have employed feature selection algorithms. Bayesian network TFBS model

Contributing group members

Main Publications

R. Pudimat, E. G. Schukat-Talamazzini and R. Backofen (2005) A multiple-feature framework for modelling and predicting transcription factor binding sites. Bioinformatics, 21(14), 3082-8
PDF (155 KB) BibTeX Entry

Cooperations

E.G. Schukat-Talamazzini, Chair for Pattern Analysis, Friedrich-Schiller-University Jena, Germany

Funding

Verbundprojekt: JCB (FSU) Jenaer Centrum für Bioinformatik - FSU/Kern, Projekt D. 5 (BMBF): Stochastische, constraint-basierte Ansätze zur Beschreibung von regulativen Sequenzen

Modelling Promoter Logic

It is a common assumption in current research that promoters are more or less non-functional sequences upstream of coding sequences which contain modules of transcription factor binding sites. These groupings of TFBS are given by functional relations between the binding factors. Circumstances make it difficult to learn and predict TFBS modules. The relative orientation and distance of the single TFBS within such a module are highly variable. So deterministic modelling of inner-module relations are not promising. On the other hand, available data is very limitted, thus statistical learning of these modules is also not promising. These two facts motivated us to develop an approach which models logical relations stochastically between members of a module in a specially designed Bayesian network. The parameters of such a network are not trained by any data but are fixed probabilities which express truth values true and false. By evaluating all the modelled knowledge the system provides an probability distribution for each TFBS type at each position of an input sequence. This probability distribution is used as a position-specific prior distribution for single TFBS models. The effect of this approach is a amplifying of TFBS model hits whose logical module constraints are fullfilled and a penalising of false positive hits otherwise. Promoter structure

Contributing group members

Funding

Verbundprojekt: JCB (FSU) Jenaer Centrum für Bioinformatik - FSU/Kern, Projekt D. 5 (BMBF): Stochastische, constraint-basierte Ansätze zur Beschreibung von regulativen Sequenzen