While very interesting, the equation below won't work (I think)
because the various parameters aren't all independent if you're only
considering the best (or near best) regions of alignment.
There is an extensive theory behind probabilities of two sequences
matching each other with a given level of similarity over a particular
An introduction to the theory, with many references, can be found on
the BLAST help page at:
An excerpt is:
From Karlin and Altschul (1990), the principal equation
relating the score of an HSP to its expected frequency of
chance occurrence is:
E = K N exp(-Lambda S)
where E is the expected frequency of chance occurrence of an
HSP having score S (or one scoring higher); K and Lambda are
Karlin-Altschul parameters; N is the product of the query
and database sequence lengths, or the size of the search
space; and exp is the exponentiation function.
Lambda may be thought of as the expected increase in relia-
bility of an alignment associated with a unit increase in
alignment score. Reliability in this case is expressed in
units of information, such as bits or nats, with one nat
being equivalent to 1/log(2) (roughly 1.44) bits.
leen at bio-3.bsd.uchicago.edu (Lee Newberg) writes:
>The average number of "matches" with exactly those parameters
>that arises randomly is not too difficult to figure out.
> Putting it all together gives
>E = (L1 + 1 - LR) * (L2 + 1 - LR) * (LR choose N) * (25%)^(LR-N) * (75%)^N
>In article <4u6oio$5tg at mserv1.dl.ac.uk>,
>Leonid A. Sadofiev <leosad at may.stud.pu.ru> wrote:
>> Dear all,
>>>> I can't find a good idea, how to calculate:
>>>> Than I comparing two sequences (amino acid or nucleotide)
>> with length L1 and L2, I get a common region with
>> length LR, containing N mismatches.
>> The questions are:
>> What a chance to obtain such region in unrelated sequences ?
>> Can I use the binomical formulas for this case ?
>>>> Could any body send me the formulas to calculate this chance
>> or reference for it ?
>>>> Please reply to leosad at may.stud.pu.ru>>>> Thanks in advance.
>> Leonid A. Sadofiev
Steven E. Brenner | S.E.Brenner at bioc.cam.ac.uk
MRC Laboratory of Molecular Biology |
Hills Road | Office: +44 1223 248011
Cambridge CB2 2QH, UK | Fax: +44 1223 213556