In article <1993Mar2.120540.1382 at gserv1.dl.ac.uk>, risler at cgmvax.cgm.cnrs-gif.fr writes:
>> Dear fellow netters,
>> Like many of you, I use BLAST at NCBI for searching sequence databanks.
> Like many of you, I don't like using programs when I don't understand what
> (and how) they do.
> Hence I've tried to read the original papers about BLAST and, in particular,
> I've tried to understand how they compute the probability P(N) associated
> with a given score. I must confess that I failed to fully understand, either
> because I'm just stupid and/or because it is not clearly written. In any
> case, I thought that P(N) was computed from the figures obtained by a very
> large number of simulations. If this was true, then this probability should
> be the same for the same hit whatever the databank used.
>> A colleague of mine recently searched a protein sequence with BLAST against
> the "non-redundant protein databank" and against Swissprot. She got in both
> cases the same hit with the same score, but with different probabilities.
> With the non-redundant database P(N) was 0.84 and with Swissprot P(N) was
> 0.51. The segment pairs were exactly the same in both cases.
>> Could somebody help me understand?
>> Thank you,
>> --------------------------------------------------------------------
> | Jean-Loup Risler | |
> | CNRS | risler at frcgm51.bitnet |
> | Centre de Genetique Moleculaire | risler at cgmvax.cgm.cnrs-gif.fr |
> | 91198 Gif sur Yvette Cedex France | |
> --------------------------------------------------------------------
1