Statistics help on Bestfit output

Richard Friedman friedman at convex.hhmi.columbia.edu
Thu Nov 21 05:12:55 EST 1996

Let A= score. m=estimate of mean of randomized scores.
s= estimate of standard deviation of randomized scores.
t= students t (see below).

If A > m(+ or -)ts then it is significant. t will in general 
depend on the confidence limits (How sure you want to be
that it is not random) and the number of ransomization 
samples that you took. For example: For a large number of 
random samples t= 1.96 for 95% confidence limits. Two
good books are "Analytical Chemistry" by Skoog and West
and "Statistics for Chemists" by Youmans (I think). The
above treatment only hold for a completely random model
of biochemical sequences - something that is not really 
valid. More detailed methods of estimating statistical 
validity appear in the work of Karlin and coworkers but I 
don't understand their work. The most sensible easy 
criterion I have found is that if identity > 25% and length 
greater than 80%, the sequences are structurally related.
I hope that this was of soem help.

More information about the Info-gcg mailing list

Send comments to us at biosci-help [At] net.bio.net