/  A question has arisen on the net concerning why the scores +5 for a match and
/-4 for a mismatch are used as a default by BLASTN, and what the consequences of
/changing the default match score from +5 would be.

[some information deleted]

/  Given the model described above, there are only two distinct scores (match
/and mismatch), and multiplying by a constant changes only the number of bits
/represented by a unit score.  Fixing the score for a mismatch at -4 allows a
/range of PAM matrices to be selected by varying M, the score for a match, as
/summarized in the following table.
/
/      PAM      Percent    Bits/Unit   Average information  90% Efficiency
/M   distance  conserved     score     per position (bits)   range (PAMs)
/
/1      0.3       99.7       1.992            1.97              0 -   5
/2      5.3       94.9       0.968            1.63              0 -  17
/3     16.0       85.6       0.595            1.18              1 -  33
/4     30.2       75.0       0.396            0.79              8 -  49
/5     47.0       65.1       0.275            0.51             21 -  68
/6     65.0       56.5       0.196            0.32             36 -  86
/7     86.0       48.8       0.138            0.19             56 - 108
/8    109.0       42.5       0.096            0.11             79 - 131
/
/It will be seen that M = +5 (the BLASTN default) corresponds to a PAM distance
/of 47 PAMs, or sequences that are about 65% conserved when back mutations are
/considered.

[a little more stuff deleted]

/PAM-47 scores are at least 90% efficient in
/detecting the similarity of sequences diverged by anywhere from 21 to 68 PAMs
/(82% to 55% sequence conservation), which seems like the most typical range of
/similarity sought.

What do you mean when you say the search is 90% efficient? Does this
mean that up to 10% of the matches in this  range could be missed, and that
an even higher proportion of matches outside the range (for example, 100%
identity) could be missed as well?

On an unrelated topic, are there any plans by the people who
maintain the GenBank server to provide a means to restrict the amount of
output that BLAST returns? I received a message from GenBank that our
mailer had bounced the  BLAST output because it was  too large. The people
who provide the TCP/IP package for our VAX say that the SMPT protocol
automatically rejects mail larger than 1 MByte to prevent the malicious
squandering of system resources.

