In article <271rjf$5ui at terminator.rs.itd.umich.edu>, cash at geneva.csmil.umich.edu (Howard Cash) writes:
> I'd like to bring some more minds in on an ongoing discussion here:
> What makes a good match between NUCLEIC ACIDS? I ask about DNA
> to eliminate the discussion of PAM scores and likely mutations.
>> If one is doing an error-tolerant comparison of strings that
> SHOULD match exactly (as is the case when doing plain text searches
> or sequencing fragment assembly) how should one balance length
> of match against percent match? Is an exact match of 20 bases
> better than a 96% exact match of 25 bases? I have seen heuristics
> used for this decision, but have never seen any of them backed
> up with much discussion.
>> If you want to mail to me directly, I will post a summary to the
> net.
>> -Hobie
>>cash at csmil.umich.edu>
Arratia & Waterman have published a series of very interesting papers back in the mid-1980s dealing
with the problem of statistical significance in DNA sequence similarity (4-letter alphabets).
C A Ouzounis
EMBL
Heidelberg