Tim Davies <daviest at shaw.ca> wrote:
> i just seem to remember that the folks who sort of did the human dna
> thing last year have a jv with ibm to build a box of around 30tera flop
> or so. It was to do structure of proteins, so naturally they believe
> that crunch power is the key to some aspect of the problem. There box
> will be only partially efficient , so i was thinking that they must be
> very very sure of there math to build a 100mill box that is only a super
> cluster type device. Where there is confidence in the math , there are
> much more efficient routs than the method chosen. so i wonder is the
> math known , and what is it. does it fit a fast methodology .
There are many more approaches to computing protein structure than the
"brute force" way Artem described - probably they want to do one of these.
On the one hand, these are even more limited: They depend on empiric
knowledge about protein structures and folding processes. On the other
hand, they have in fact proven successful in predicting structure
(whereas doing QM on proteins in solvents can only be thought of), and
they might help us to derive rules, to understand the principles of
folding (and of evolution) of proteins.
One common method is "homology modelling". The basic assumption is that
proteins that are related by evolution (homologous) share a similar
structure, and that homology can be detected by sequence
similarity. Therefore the task is
a) to find homologous sequences and
b) to align the parts and model the unknown structure along the known
structure of the homologous protein, and then do some energy
optimization (using semiempirical methods, for example).
Even a) is not trivial, because there may be insertions and deletions of
sequence stretches, one protein may have evolved by fusion of two
different genes and so forth. And mutation rates as well as evolutionary
pressure to preserve structurally important residues may differ between
species, and between structure classes.
The parallel-beta-helix proteins, for example, all share a common fold,
and most of them are bacterial proteins that have a very similar
enzymatic activity. However, the sequence similarity is rather low. This
is even more true for a second type of parallel beta-helix proteins,
bacteriophage tailspikes: No sequence similarity at all in the betahelix
part, but our group will soon solve the structures, and crystalls do
look like they have the same fold.
And then, b) is even more challenging. This is not only a mathematical
problem, it's protein science. Just look into reviews relating
to CASP (Computer Aided Structure Prediction), an international
"Contest", CASP 4 is the most recent IIRC.