|
|
Most scoring functions used in protein fold recognition employ two-body (pseudo) potential energies. We reason that the use of higher-order terms may improve the performance of current algorithms. In our studies, proteins are represented by the side chain centroids of amino acids. Delaunay tessellation of this representation defines all sets of nearest neighbor quadruplets of amino acids. Four-body contact scoring function (log likelihoods of residue quadruplet compositions) is derived by the analysis of a diverse set of proteins with known structures; we term this scoring function the Simplicial Neighborhood Analysis of Protein Packing (SNAPP) score. A test protein is characterized by the total score calculated as the sum of the individual log likelihoods of composing amino acid quadruplets. We have demonstrated that the scoring function distinguishes native from partially unfolded or deliberately misfolded structures. It also discriminates between pre- and post-transition state and native structures in the folding simulations trajectory of Chymotrypsin Inhibitor 2 (CI2). In a recent study (Zhang et al, 2004) we have also demonstrated that liganded (closed) protein conformations are more stable (have higher SNAPP score than unliganded (open) conformations, an observation that was in agreement with experimental results.