Paul Edward Kowalski wrote:
> I've been using GCG's MFold/Plotfold programs to look at possible
> secondary RNA structure (stemloops etc.) which might be found in the 5'
> UTR of the gene I'm working on. The output (I use "squiggle plot") gives
> you a nice picture of optimal and suboptimal structures ranked by the
> minimum energies. My question is- if the minimum energy value is
> dependent upon the the LENGTH of the input sequence, what min. energy
> value is significant or relevant? Kozak, J Mol Bio 1994, mentions that
> stemloops with a energy of greater than -30 kcal/mol are an impediment
> to the scanning 43 ribosome, but the kcal/mol value just depends on how
> much seq you give MFold. Is the only way to tell what's relevant to:
Wrong. The criterion uses the stability of a single stem/loop (or
hairpin)
whereas you seem to use the total stability of the 5' UTR. The total
stability
is less relevant than the stabilities of each of the hairpins (and these
won't
simply increase with the sequence length as real long stems tend to be
pretty
rare in naturally occurring sequences).
That answers the question about minimum free energies, here's another
question:
What's the difference in free energy between the optimal structure and
the
suboptimal ones, and what does that tell about the equilibrium
distribution
of structures?
(After all, we're talking thermodynamics here!)
But what exactly do you mean by 'relevant' ??
> A) shuffle the seq of interest -randomize- but maintain the base
> composition ratios, and do it again and see if the min energy has
> dropped?
That will tell you about statistical signicficance of your structure or
parts
of it. Provided you shuffle quite a few times you might have a good
estimate
how unique the _stability_ of your structure is. Again, the stability of
the
whole UTR isn't necessarily a good handle on functional aspects. Picking
the
most stable hairpin from each of the random structures might serve you
better
if you think stable structures reduce translation rates.
> B) Use other "length-matched" DNA sequences from randomly chosen Genbank
> entries, and see how their min. energies differ from the test one?
I assume you want to try UTR sequences there; coding sequences have
constraints
imposed on their sequences that might bias the stability of secondary
structure.
And I would suggest to focus on local stability of hairpins instead of
comparing
free energies of structures of a fixed length.
Again, what you learn from this kind of analysis is the uniqueness of
the stability
of your structure, compared with other UTRs. If stable structures in
5'UTRs are
a common motif in mRNAs, this might show up that way.
But there's more to biological function of RNA: kinetic effects favoring
a less
stable RNA structure, or recognition of RNA structure by proteins. It's
a bit
hard to say what's 'important' in your case without knowing what you are
looking for.
Michael Schmitz