Bioinformatics Group Freiburg

Sequence Data - HP sequence classification via folding properties

We used a thermodynamic and kinetic feature based classificatin procedure to identify protein-like sequences in the 3D-cubic HP-model. The following properties are tested:

non-degenerate ground state (unique minimum free energy structure)
good folder (ability to adopt this structure in short time folding simulations
sequential folding accessibility of this structure along low barrier pathes

These properties ensure a thermodynamically stable native structure (the unique mfe) and the ability to fold into this functional conformation within a short time interval as requested by short biomolecule life cycles. Furthermore, the sequential assembly of proteins is considered. There is evidence for a co-translational folding during elongation that should restrict the accessible folding space. Thus we are only interested in sequences that are able to form their native structure via sequential folding without high energy barriers in the traversed energy landscape.
A sequence fulfilling all criteria is called protein-like. If the ground state is not reachable sequentially but via global folding at high rate is is classified as a good folder. Bad folder are not able to adopt the native structure in a short time interval. All checked sequences are non-degenerate, i.e. having a unique ground state.

Main Publications

Martin Mann, Daniel Maticzka, Rhodri Saunders, and Rolf Backofen.
Classifying protein-like sequences in arbitrary lattice protein models using LatPack.
In HFSP Journal, 2 no. 6 pp. 396, 2008,
Special issue on protein folding: experimental and theoretical approaches.

Supplementary data can be obtained HERE .

HP in unrestricted 3D-cubic

Sequence length 27 : protein-like, good folding, bad folding and the whole list of non-degenerate sequences

Benchmark set for Protein Chain Lattice Fitting (PCLF) Problem

This is the benchmark set of high resolution protein structures used for benchmarking tools solving the Protein Chain Lattice Fitting (PCLF) problem (see publication below).
The test set was taken from the PISCES web server (Wang and Dunbrack, 2005). We enforced 40% sequence identity cutoff, chain length 50–300, R-factor ≤ 0.3, and resolution ≤ 1.5 A to derive a high-quality set of proteins to model. Given our requirement for side chains, C_alpha-only chains were ignored. The resulting benchmark set contains 1198 proteins exhibiting a mean length of 160.

Main Publications

Martin Mann, Rhodri Saunders, Cameron Smith, Rolf Backofen, and Charlotte M. Deane.
Producing high-accuracy lattice models from protein atomic co-ordinates including side chains.
In Advances in Bioinformatics, 2012 no. Article ID 148045 pp. 6, 2012. MM and RS contributed equally to this work.

Contact

In case of questions, comments or contributions to this page please contact Martin Mann