risler at cgmvax.cgm.cnrs-gif.fr risler at cgmvax.cgm.cnrs-gif.fr
Tue Mar 2 07:06:47 EST 1993

 Dear fellow netters,

 Like many of you, I use BLAST at NCBI for searching sequence databanks.
 Like many of you, I don't like using programs when I don't understand what
 (and how) they do.
 Hence I've tried to read the original papers about BLAST and, in particular,
 I've tried to understand how they compute the probability P(N) associated
 with a given score. I must confess that I failed to fully understand, either
 because I'm just stupid and/or because it is not clearly written. In any
 case, I thought that P(N) was computed from the figures obtained by a very
 large number of simulations. If this was true, then this probability should
 be the same for the same hit whatever the databank used.

 A colleague of mine recently searched a protein sequence with BLAST against
 the "non-redundant protein databank" and against Swissprot. She got in both
 cases the same hit with the same score, but with different probabilities.
 With the non-redundant database P(N) was 0.84 and with Swissprot P(N) was
 0.51. The segment pairs were exactly the same in both cases.

 Could somebody help me understand?

 Thank you,

 | Jean-Loup Risler                   |                               |
 | CNRS                               | risler at frcgm51.bitnet         |
 | Centre de Genetique Moleculaire    | risler at cgmvax.cgm.cnrs-gif.fr |
 | 91198  Gif sur Yvette Cedex France |                               |

