campanelli lab wrote:
>> I'm pretty statistically ignorant. What are some good rules to use in
> comparing two aligned sequences similarities or percent identities with
> a randomized version of one. For example, two sequences have 34%
> sequence identity in a pileup. After randomizing one of the sequences
> this falls to 15%. What is a good way to judge the significance of this?
> Any references would be appreciated. Thanks.
>> Steve Johnson
> Biochemistry
> Univ. of Illinois
>sljohnsn at staff.uiuc.edu
Biology does not always pay attention to statistics.
There are some genes with little similarity that have the
exact same function. There are other genes that are nearly
identical and have oposing functions (one DNA binding protein
may be a transcriptional activator and the other a
transcriptional repressor).
The simple measure of similarity or sequence identity
is a good start, but we would also like to know:
Is the similarity evenly distributed throughout the
genes, or are their conserved domains, seperated by
variable regions?
Are these two genes from the same species, or are
you comparing a human gene to an E. coli gene?
Are the first two positions of the codons more
likely to be conserved than the last? (What
is the synonymous/nonsynonymous subsitution ratio?
this is not very useful when sequence identity is
less than 60%).
There are a number of ways to get statistically
significant similarity. Two genes could convergently evolve
toward a similar sequence. A single gene can duplicate and
diverge within a single species. A single gene can diverge between
wo different species. Parts of one gene can recombine with
another.
--
____________________________________________________________________
|Brian T. Foley btf at t10.lanl.gov |
|HIV Database (505) 665-1970 |
|Los Alamos National Lab http://hiv-web.lanl.gov/index.html |
|Los Alamos, NM 87544 U.S.A. http://hiv-web.lanl.gov/~btf/home.html||____________________________________________________________________|