Multiple sequence alignment using geometric ideas
(Kolodny, Agarwal, Bilu)

We have investigated the well-known problem of multiple sequence alignment (MSA) using a fresh, geometric perspective. This is a central problems in computational biology and has been widely studied. For a given set of k sequences, each of length at most n, the best known algorithm can find a globally optimal solution in O(n^k) time. The problem is known to be NP-complete and a few approximation algorithms are known that guarantee 2-\Omega(1/k) approximation ratio in the worst case. Several practical heuristics with various advantages and disadvantages have been proposed, many of which are based on progressive application of a pairwise sequence-alignment algorithm. These approaches typically produce a locally optimal solution.

We are investigating an approach that works well when (1) each pair of sequences are either very similar to each other or quite different, and (2) if they are similar, then their optimal alignment has only a few gaps. These assumptions hold for typical protein sequence inputs to MSA problems (and the lower bound constructions do not extend under these assumptions); they were suggested to us by the partial order alignment (POA) heuristic of [1]. Using geometric techniques, we are developing a compressed representation of the dynamic-programming table, which can be computed efficiently. The efficiency of the algorithm will depend on how well the pairs of sequences align; tradeoffs are possible between efficiency and quality of the solution. Our algorithm can also be used as a preprocessing step with other MSA algorithms.

[1] Lee, C., Grasso, C., Sharlow, M.F., Multiple Sequence Alignment Using Partial Order Graphs, Bioinformatics, 2002, 452-464