Computing ensemble properties of molecular pathways using probabilistic conformational roadmaps.
(Latombe; Apaydin, Brutlag, Guestrin, Hsu, Singh, Varma)

Many interesting properties of macromolecular motion are best characterized statistically by considering an ensemble of motion pathways rather than an individual one. For example, the new view of protein folding kinetics replaces the traditional idea of a single folding pathway with the broader notion of energy landscapes and folding funnels. Proteins are thought to fold in a multi-dimensional funnel by following a myriad of pathways, all leading to the same native structure. To carry out computational studies of macromolecular motion in this framework, we need efficient algorithms that can quickly explore many motion pathways and compute ensemble properties. Classic simulation techniques such as Monte Carlo and molecular dynamics techniques tend to focus on individual pathways. They are computationally impractical if applied in a naïve fashion to generate and analyze a large number of pathways. To deal with this issue, we have introduced a new computational scheme -- Stochastic Roadmap Simulation -- that derives from the probabilistic roadmaps previously developed in robotics. A conformational roadmap is a collection of molecular conformations sampled at random. Nearest neighbors are connected by arcs labeled by transition probabilities derived from energy differences and established so that stochastic simulations in the roadmap are equivalent to Monte Carlo simulations. However, tools from Markov theory (first-step analysis) allow us to directly compute stationary distributions by solving a sparse linear system, without performing any simulation explicitly. We have implemented this approach and experimented with it in two domains: the computation of the transmission coordinate (probability of folding) in protein folding and the estimation of binding time, escape time, and absolute energy flux in ligand-protein binding interaction. Our experimental results show high correlation with results obtained with classical techniques, but were obtained several orders of magnitude faster.