Fast Protein Folding in the Hydrophobic-hydrophilic Model
Within Three-eights of Optimal
William E. Hart and Sorin Istrail
Sandia National Laboratories
Massively Parallel Computing Research Laboratory
MS 1110
Abuquerque, New Mexico
wehart at cs.sandia.gov, scistra at cs.sandia.gov
We present performance-guaranteed approximation algorithms for the
protein folding problem in the hydrophobic-hydrophilic model,
Dill (1985). To our knowledge, our algorithms are the first
approximation algorithms in the literature with guaranteed performance
for this model, Dill (1994). The hydrophobic-hydrophilic model
abstracts the dominant force of protein folding: the hydrophobic
interaction. The protein is modeled as a chain of amino acids of
length n which are of two types: H (hydrophobic, i.e., nonpolar)
and P (hydrophilic, i.e., polar). Although this model is a simplification
of more complex protein folding models, the protein folding
structure prediction problem is notoriously difficult for this model.
Our algorithms have linear (3n) or quadratic time
and achieve a three-dimensional protein conformation that has a
guaranteed potential free energy within 3/8 of optimal. By
achieving speed and near-optimality simultaneously, our algorithms are
consistent with the recently proposed framework of protein folding by
Sali, Shakhnovich and Karplus (1994). Equally important, the folding
pathway and final conformations of our algorithms are biologically
plausible. The algorithms define folding pathways that fit within the
framework of diffusion-collision protein folding proposed by Karplus
and Weaver (1979), and final conformations generated by the algorithms
have significant secondary structure (helixes, anti-parallel sheets,
beta sheets, hydrophobic core). Previous algorithms have employed
exhaustive search of protein sequences and conformation for sequences
of length 18 or less. For longer sequences (length <= 30),
previous algorithms have performed random sampling of sequences for
which exhaustive search of conformations was performed. Our result
answers the open problem of Ngo, Marks and Karplus (1994) about the
possible existence of a performance guaranteed approximation algorithm
for protein structure prediction in any well-studied model of protein
folding.