Whatever talks are going on, the protein folding problem is
not solved by LINUS. The approach used in LINUS lacks a lot
of necessary parts for solving the problem.
Also, the true context of the work should be stated as
'another approach in predicting small protein(or long peptide) sec.
structures with partial and imprecise topological prediction'
I would find it useful to have a very crude and fast preview for
any prediction work for any small size proteins before I really make
very serious prediction.
Jong
In article <3u71g7$bu5 at saba.info.ucla.edu>,
legrand at tesla.mbi.ucla.edu says...
>>In article <6gwxdmdks7.fsf at hodgkin.mbi.ucla.edu>,
arne at hodgkin.mbi.ucla.edu (Arne Elofsson
>) (Arne Elofsson) writes:
>> In article <3u3t5m$pdb at saba.info.ucla.edu>
legrand at tesla.mbi.ucla.edu (Scott Le Grand) w
>rites:
>>>> >
>> > In article <1995Jul12.182420.12069 at alw.nih.gov>,
johnk at spasm.niddk.nih.gov (John Kusze
>wski) writes:
>> > > I think that part of this is explained by his having "solved" these
>> > > structures in short pieces (because the program is
computationally
>> > > expensive).
>> >
>> > This would be a lame excuse if true... We live in an era of
inexpensive
>> > 300 Mhz desktop workstations...In fact, even within the LINUS
paper, there are
>> > numerous instances of working with larger fragments i.e. the
GroES prediction.
>> > My biggest problem with the paper is the 12 day turnaround from
submission
>> > to acceptance. There are numerous ambiguities in the description
of the
>> > methods (what proteins was it trained on? How do you assemble
overlapping
>> > fragments? How were the fragments for the results selected?
How consistent
>> > are independent LINUS runs on the same fragment? Why oh why
did they neglect
>> > to show the DHFR data?) which should have been caught by the
referees and fixed
>> > by the authors.
>> >
>>>> Yeah I can agree that 12 days seems very very short. (any reviewers
wanna identify
>> themself ?)
>>>> However it must be assumed that by time of submission the (and
the DHFR)
>> were the only simulations (with these parameters) done at time of
submission.
>>>> They do not overlap overlapping fragments. And do not claim they
do.
>>You're right, but if you look closely, you'll notice that the lengths
>of some reported fragments varies rather wildly and seems to be
selected to be
>the ends of various elements of secondary structure. This is
observer introduced
>bias, no matter how small... Examples include PCY 17-35, PCY 36-50,
PCY
>51-65, PCY 1-16, PCY 66-99, EGLIN 8-40, EGLIN 8-70, and EGLIN 40-70.
>>I'm also very interested to know what is generated by multiple runs
on
>the same fragment... If they get precisely the same structure (0.0 A
RMSD),
>then they aren't using a proper random number generator...
>>> I do not agree this paper is more unambigous than many other
papers.
>> The problem is that it was so extremely hyped out before the
publication.
>>It's right in the middle of the spectrum of ambiguity. It provides a
good
>overview of the method, but when it comes down to implementing it
based on
>the methods sections, there are unclear such as aspects of the
potential
>function (try to figure out >EXACTLY< the hydrophobic component,
and just
>what is that 2nd sidechain atom on Thr?), and the locking of triplets.
>>> It is quite certain that they optimised their (very simple) parameters
>> on this training set, or a part of it, but they do not claim anything
>> else, so you can not hold them to that. (For instance what did you
think
>> Jim did when he optimised his parameters for the 3d-1d paper ?)
>>Jim's folding of 434 repressor clearly is not prediction. But Jim
makes
>it clear which protein he used to train his parameters. This paper
does
>not give any insight as to how the parameters were developed and
that is
>tambiguity.
>>> > > To start another thread, are models of that resolution useful for
>> > > anything?
>> >
>> > A wonderfully controversial question. I'm in the school of thought
that
>> > if I look at a model and it "looks" like the native structure (I know,
horribly
>> > subjective), then it is useful no matter what the RMSD. One of the
big
>> > problems with the results section in this paper is that the authors
>> > usually do not show us a complete model of the predicted
structure, but only
>> > seemingly arbitrarily chosen fragments which "worked"...
>> >
>>>> If they did that (which I really doubt) it is fraud and scientific
missconduct.
>> I have the feeling that actually all they did was what is shown in
>> the paper. And if you want to look at stuctures everything is there in
>> molscript pictures. What more can you ask for ?
>>Well, they definitely have not shown >ALL< that they did. They neglect
to
>show us even a single fragment of DHFR... I wouldn't call it scientific
>fraud and misconduct though... They do show examples where the
algorithm
>fails, even though they try to talk their way out of it ie the packing of
>the last helix of cytochrome b562...
>>> > > |> It is interesting that such a simple method seems to work that
well.
>> > >
>> > > Precisely. I just saw Andrej Sali give a talk on MODELLER, and its
>> > > output is amazingly good. However, he's using a very large
empirical
>> > > database. LINUS does extremely well for having so little starting
>> > > information.
>> >
>> > If LINUS is really predicting secondary structure as well as it
seems
>> > (I'm betting that it's not), then it does seem the the whole game's
a lot
>> > simpler than we thought. I can submit some apocryphal data here.
In my PhD
>> > work, I used a Sippl potential to predict several protein structures.
It did
>> > a wonderful job of secondary structure prediction on melittin,
pancreatic
>> > polypeptide, and crambin (as good as LINUS I would daresay, but
this was all
>> > helix and coil prediction and easy targets), but it did a miserable
job packing
>> > things together. This work is summarized in Molecular
Simulations 13:299-320.
>> > A lot of the figures in the LINUS paper look familiar to me.
>> >
>> But you could not predict any sheets. (:
>>True :-). The most impressive part of this paper is the prediction of
>sheets... The least impressive is the calculation of RMSDs between
predicted
>and X-ray helices...
>>> And even if they do not do such a great work on tertiary structure
packing
>> It is uch better than your phd work.
>>Certainly true of IFB, but the eglin structure looks to be about as much
>a mess as my crambin (12.1 A RMSD versus 9.5 A)... No other
reasonably
>complete tertiary structures are presented except for the GroES
prediction
>which remains just that...
>>> Skolnick also wrote in his 1994 papers (Kolinski & Skolnick, Proteins
1994)
>> that their potential performed very well in prediction sec.str. They
>> claimed to have a paper in preperation but atleast I have not seen it.
>>You're right. I suspect that these potentials may be fairly good at
>such prediction where the segment has a locally determined
preferred
>conformation, but is it performing better than PhD or GOR?
>>> Their "sec.str. prediction" is probably as good for approximately as
many
>> targets as Rose's. However their targets were less diverse and that
was
>> not at all the focus on the papers. (Skolnick also had to use slightly
>> different potential functions for one protein (ubiquitin ?))
>>Yep...
>>> > > One last question: Are there any other algorithms that predict
>> > > secondary structure as well as LINUS?
>> >
>> > A tough question. That requires testing LINUS on a set of
>> > proteins not involved in its development and comparing it to
>> > the performance on those same proteins by PhD and GOR
(assuming
>> > GOR does not use them in its database either). Ignore arguments
>> > that LINUS is not based on amino acid identity. If training set
>> > data is involved in any way in the development of a method, then
>> > it is not fair to rate the predictive power of a method by its
>> > performance on training set data. It is only fair to conclude
>> > that the method has learned how to reproduce the training set.
>> > The only fair test is on external data. The upcoming predictive
>> > targets Moult is putting together should be a wonderful example
>> > of this.
>> >
>>>> We do not actually need this, as long as people report what they do.
>>I disagree. It is very difficult, perhaps impossible, to truly eliminate
>all forms of bias in structure prediction when you're working with
targets
>with already known structures. Nowhere was this more apparent
than in
>the all-around failures at homology modelling presented at last year's
>workshop when the predictors had no clear idea what the target
structure
>was... The results were nothing like the claims of their respective
>papers...
>>> If you optimize your parameters so that they work very good on
>> a small set of proteins, (as probably Rose did). It is not bad
>> science to report that. Even if it would not work on anything
>> outside the test set it might be very useful and interesting.
>>It is not bad science to report it. It is bad science to report it
>as prediction. The neural networks people went through this phase
>many years ago demonstrating that one can converge a sufficiently
>elaborate neural network to almost any training set... We seem not
>to have gotten past it yet...
>>> However you are right that it is much more impressive to predict
>> a completely independent test set.
>>That's what it will take to knock my socks off...
>>Overall, they score a hit on IFB, the prediction of helical secondary
>structure in cytochrome b562 and myoglobin, and some of the
secondary structure
>of plastocyanin and eglin. They fail at the task of tertiary structure
>prediction of Eglin, cytochrome c, and DHFR. Nothing is shown of
>complete structures for any other protein so nothing can be said
about
>it...
>>Scott
>>