|
Deepak Bandyopadhyay is developing as his PhD thesis a framework for Modeling nearest-neighbor relationships in imprecise point sets such as proteins. For points in arbitrary dimensions, the Delaunay tessellation gives a precise geometric definition of neighbors using an empty sphere property.
The set of almost-Delaunay simplices, AD(e), provides possible neighboring k-tuples in the presence of a perturbation of magnitude e, that is, k-tuples that can be made Delaunay neighbors when all the points are allowed to move distances 0<=d<=e from their original locations. Delaunay probability defines the probability that points in an almost-Delaunay simplex are neighbors, given a probability distribution for the perturbation. These ideas are especially important in the analysis of protein structure, where the 3D Delaunay tessellation, traditionally used to define neighboring residues, is potentially unstable and may define very different neighbor sets for small perturbations in the input. Deepak's software for computing almost-Delaunay tetrahedra and Delaunay probability for 3D points is available from the project web page.
Working with Jun (Luke) Huan and Wei Wang, Deepak has adapted their frequent subgraph mining algorithm (which identifies spatial packing patterns characteristic of protein families) to graph representations based on almost-Delaunay edges. The almost-Delaunay graphs are sparser than contact maps and more tractable for the problem of subgraph mining, while overcoming robustness problems with the Delaunay graphs and thus finding larger and more meaningful residue packing patterns. A pattern that is frequent within a protein structural or functional family and rare in the rest of the PDB is called a "fingerprint" of the family. Fingerprints often contain functionally important residues, and a set of fingerprints of established families can be used for rapid automatic functional annotation of new structures determined by structural genomics projects. Deepak developed a new graph indexing and search method for doing this annotation. Working with Ruchir Shah, a student of Dr. Alex Tropsha, he has also mapped the almost-Delaunay tetrahedra to the protein sequence to identify regular expressions (similar to PROSITE patterns) of four neighboring residues that are characteristic of protein families.
Deepak has also extended his earlier work on using AD tetrahedra to demonstrate the robustness of Dr. Tropsha's statistical potential based on the Delaunay tessellation (SNAPP) for decoy discrimination. He showed quantitatively that SNAPP scores based on almost-Delaunay tetrahedra and Delaunay probability make a bigger distinction between native structures and the highest-scoring decoys, than scores based on the Delaunay tessellation alone. He also applied the earlier method of secondary structure assignment using patterns in almost-Delaunay tetrahedra to a large non-redundant set of proteins, to show that the method finds many irregular alpha-helices while missing few, mainly small helices found by the standard DSSP method. The AD assignment is consistent in structures with small differences in their coordinate values, such as structural homologs or an ensemble of NMR structures, where methods such as DSSP may have a divergent assignment.
This summer, Deepak will use almost-Delaunay tetrahedra to study protein conformational change and motion. In preliminary work with Dr. Charlie Carter, he was able to identify hinge regions in the protein TrpRS using almost-Delaunay tetrahedra. He will also complete the ongoing projects with Luke Huan and Ruchir Shah on protein family classification, functional annotation and finding sequence signatures using almost-Delaunay edge graphs and almost-Delaunay tetrahedra.