|
|
Detailed knowledge of the tertiary structure of a protein is required for an understanding of its biological function. Experimental data at atomic resolution can usually be obtained by X-ray crystallography and/or nuclear magnetic resonance techniques. It is not feasible however to determine experimentally the structure of the millions of proteins whose corresponding genes have been sequenced as part of the multiple genome projects. There is hope that biologically useful models can be derived by inference from the databases of known protein structures. This hope is based on the common knowledge that proteins with homologous sequences share similar structures. In such cases, models for the unknown structure of a new protein (the target) can be derived from the structure of a homologous protein (the template) using comparative modeling techniques. This transfer of structural information is limited however in the variable regions in the sequence alignment between the target and template proteins. These are usually the results of substitutions, insertions and deletions of residues between members of the same structural family, and frequently corresponds to exposed loop regions. We are developing a method for protein loop building that combines an ab initio approach with a database approach. Following ab initio procedures, we include an exhaustive search of a discretized version of the C-space. We represent the candidate loops as a sequence of rigid building blocks that are concatenated without any degrees of freedom. Following the database approach, the building blocks are chosen from libraries of short protein backbone fragments, which represent protein chains accurately, and are economical. We are looking at the trade-off between feasibility of an exhaustive search, and accuracy of the loops that are built. We are also exploring ways to identify the native-like loop among all candidate loops generated exhaustively.