Libraries of protein fragments for structural modeling
(Levitt; Guibas, Kolodny, Koehl)

Prediction of protein structure depends on the accuracy and complexity of the models used. We represent polypeptide chains by a sequence of rigid fragments that are concatenated without any degrees of freedom. Fragments chosen from a library of representative fragments are fitted to the native structure using a greedy build-up method. This gives a one-dimensional representation of native protein three-dimensional structure whose quality depends on the nature of the library. Each library is characterized by the quality of fit (accuracy) and the number of allowed states per residue (complexity). Our goal is to find representations that are both accurate and economical (low complexity). We recently constructed a library that outperforms earlier libraries in this regard: with 10 states per residue we approximate native protein structure to 1 angstrom compared to over 20 states per residue needed previously. We plan to extend the above described models to approximate the full backbone chain, rather than only the C-alpha atoms approximated in the current model. This can be done by modeling each of the library fragments by a series of dihedral angles. By exploiting the correlation between angles of neighboring residues along the polypeptide chain, one can produce simple and accurate libraries.

The library fragments can be used to model, and possibly predict, protein loop structure. The library of fragments, along with the construction scheme for appending fragments, allows the enumeration of all possible structures of a fixed length. Unfortunately, the number of structures is exponential in the length of the chain. Loops are fairly short, and thus the total number of structures is manageable. To further improve the performance we may construct a loop simultaneously from its two ends.

Decoy structures can also be modeled with these libraries, by randomly sampling the structures space. As noted, there are exponentially many possible structures deeming uniform random sampling ineffective. Alternatively, we can bias the sampling of structures by the secondary structure prediction of the protein considered. We believe that this method can be utilized as an effective starting point for other decoy refining schemes