![]() |
| Structure extraction from crystallography density data |
| (Guibas; Brunger, Russell) |
| The process of going from electron density data produced by the x-ray interference measurements to the protein structure in three-dimensional space is still labor intensive, and we are investigating geometric techniques to help automate it. By taking medial axes of isosurfaces of the density data and simplifying them, we can create an estimate of the protein backbone that is close to the actual backbone over much of the protein structure. Our computational experiments have confirmed the intuition that the medial axis of an isosurface extracted from an electron density map captures the essential characteristics of the geometry and the topology of the molecule. With ideal data, we get essentially perfect results. When we apply our method to real density data, the results are of course more noisy. Indeed the medial axis of an object is quite sensitive to small perturbations on the surface of the object. A key problem we need to address is the simplification and clean up of the axis. This is important even with very good quality data, as the current method generates clearly oversamples the medial axis. We are also working to apply more biological knowledge to refine the polygonal chain which is our estimate of the backbone, as well as to extract more possible backbone locations from the medial axis. We are looking in addition at the persistence of backbone features across multiple isosurface levels in order to make our estimates more robust.
The figure above shows the medial axis of the 1.5 sigma isosurface of a fragment of the sec17 protein along with an estimate of the backbone. The medial axis, shown in pink, was computed using the cocone software. The backbone estimate, computed by finding the diametral path in the medial axis, is shown in yellow. The protein is primarily alpha-helices. Notice how the diametral estimate shortcuts along the alpha helix. We are planning on addressing the deficiency both by thinning the medial axis before estimating the backbone and by applying known protein backbone geometric constraints.
The backbone estimate and sidechain locations extracted from artificially generated 2 angstrom data. The backbone estimate is shown in red. The green sidechains are weighted by importance. As can be seen, the paths with highest importance (lightest green) coincide with the actual sidechain locations.
|