/ A question has arisen on the net concerning why the scores +5 for a match and
/-4 for a mismatch are used as a default by BLASTN, and what the consequences of
/changing the default match score from +5 would be.
[some information deleted]
/ Given the model described above, there are only two distinct scores (match
/and mismatch), and multiplying by a constant changes only the number of bits
/represented by a unit score. Fixing the score for a mismatch at -4 allows a
/range of PAM matrices to be selected by varying M, the score for a match, as
/summarized in the following table.
/
/ PAM Percent Bits/Unit Average information 90% Efficiency
/M distance conserved score per position (bits) range (PAMs)
/
/1 0.3 99.7 1.992 1.97 0 - 5
/2 5.3 94.9 0.968 1.63 0 - 17
/3 16.0 85.6 0.595 1.18 1 - 33
/4 30.2 75.0 0.396 0.79 8 - 49
/5 47.0 65.1 0.275 0.51 21 - 68
/6 65.0 56.5 0.196 0.32 36 - 86
/7 86.0 48.8 0.138 0.19 56 - 108
/8 109.0 42.5 0.096 0.11 79 - 131
/
/It will be seen that M = +5 (the BLASTN default) corresponds to a PAM distance
/of 47 PAMs, or sequences that are about 65% conserved when back mutations are
/considered.
[a little more stuff deleted]
/PAM-47 scores are at least 90% efficient in
/detecting the similarity of sequences diverged by anywhere from 21 to 68 PAMs
/(82% to 55% sequence conservation), which seems like the most typical range of
/similarity sought.
What do you mean when you say the search is 90% efficient? Does this
mean that up to 10% of the matches in this range could be missed, and that
an even higher proportion of matches outside the range (for example, 100%
identity) could be missed as well?
On an unrelated topic, are there any plans by the people who
maintain the GenBank server to provide a means to restrict the amount of
output that BLAST returns? I received a message from GenBank that our
mailer had bounced the BLAST output because it was too large. The people
who provide the TCP/IP package for our VAX say that the SMPT protocol
automatically rejects mail larger than 1 MByte to prevent the malicious
squandering of system resources.
Stephen Clark
clark at galen.oci.utoronto.ca (Internet)
clark at utoroci (Netnorth/Bitnet)
"For what it is worth, many of the legends accompanying figures in this journal
are not very different, although we take some care to provide each sentence
with a verb if the author has overlooked the need for one." -J.Maddox, Nature.