In article <1995Jul12.182420.12069 at alw.nih.gov>, johnk at spasm.niddk.nih.gov (John Kuszewski) writes:
> I think that part of this is explained by his having "solved" these
> structures in short pieces (because the program is computationally
> expensive).
This would be a lame excuse if true... We live in an era of inexpensive
300 Mhz desktop workstations...In fact, even within the LINUS paper, there are
numerous instances of working with larger fragments i.e. the GroES prediction.
My biggest problem with the paper is the 12 day turnaround from submission
to acceptance. There are numerous ambiguities in the description of the
methods (what proteins was it trained on? How do you assemble overlapping
fragments? How were the fragments for the results selected? How consistent
are independent LINUS runs on the same fragment? Why oh why did they neglect
to show the DHFR data?) which should have been caught by the referees and fixed
by the authors.
> But of course, you're right--the packing of secondary structure elements
> isn't all that amazing. If you looked at the RMSDs of these models
> to the crystal structures, they're pretty high--5 A is about the best
> (if I'm remembering correctly), and some are 10 A away.
As high as 12.1 A, which is essentially random (EGLIN 8-70), but the
secondary structure prediction is quite good. OTOH the work with IFB
is seemingly impressive. My bet is that this was the primary protein they
used to tune their parameters...
> To start another thread, are models of that resolution useful for
> anything?
A wonderfully controversial question. I'm in the school of thought that
if I look at a model and it "looks" like the native structure (I know, horribly
subjective), then it is useful no matter what the RMSD. One of the big
problems with the results section in this paper is that the authors
usually do not show us a complete model of the predicted structure, but only
seemingly arbitrarily chosen fragments which "worked"...
> |> It is interesting that such a simple method seems to work that well.
>> Precisely. I just saw Andrej Sali give a talk on MODELLER, and its
> output is amazingly good. However, he's using a very large empirical
> database. LINUS does extremely well for having so little starting
> information.
If LINUS is really predicting secondary structure as well as it seems
(I'm betting that it's not), then it does seem the the whole game's a lot
simpler than we thought. I can submit some apocryphal data here. In my PhD
work, I used a Sippl potential to predict several protein structures. It did
a wonderful job of secondary structure prediction on melittin, pancreatic
polypeptide, and crambin (as good as LINUS I would daresay, but this was all
helix and coil prediction and easy targets), but it did a miserable job packing
things together. This work is summarized in Molecular Simulations 13:299-320.
A lot of the figures in the LINUS paper look familiar to me.
> |> Unfortunately this is one of many studies that would be extremely
> |> much better if they had done some more rigourous testing, from
> |> what I read in the John Hopkins journal, the run takes over night
> |> or so, which makes me wonder why they do not run this on 50 different
>> George Rose told us at his (now infamous?) NIH talk that it's more
> like 1-2 weeks per "structure."
So how exactly did this talk go and how was it received? BTW at 6000
cycles per generation, with 8 epochs, that gives 47*8*600 ~= 2e6 energy
evaluations per LINUS run. That's quite a bit of sampling and the 1 week
figure seems realistic.
> One last question: Are there any other algorithms that predict
> secondary structure as well as LINUS?
A tough question. That requires testing LINUS on a set of
proteins not involved in its development and comparing it to
the performance on those same proteins by PhD and GOR (assuming
GOR does not use them in its database either). Ignore arguments
that LINUS is not based on amino acid identity. If training set
data is involved in any way in the development of a method, then
it is not fair to rate the predictive power of a method by its
performance on training set data. It is only fair to conclude
that the method has learned how to reproduce the training set.
The only fair test is on external data. The upcoming predictive
targets Moult is putting together should be a wonderful example
of this.
Scott