I don't know of any matrices (certainly not the PAM, BLOSUM, GONNET nor MD
series that are commonly used) that include weights for "U". I can tell you
that BLAST (May 17/02 - NCBI's "What's New" announcement) does exactly as
you suggest, treat the symbol "U" as an "X" in the input data.
--
Michael Black
On 4/15/03 5:51 AM, in article pgpmoose.200304151051.19454 at net.bio.net,
"Gordon D. Pusch" <gdpusch at NO.xnet.SPAM.com> wrote:
> I have recently found evidence that BLAST and FASTA do not properly handle
> the official IUPAC single-letter-code 'U' for selenocystiene, presumably
> because it does not appear in either the PAM or BLOSUM matrices (although
> I have not been able to rule out hard-coding as a cause).
>> Are substitution matrices available that include scores for selenocystiene?
> If not, what is the least harmful way of handling the selenocystiene
> character?
> Should it be changed to the code 'X' for an unknown amino acid? Or should
> it be changed to the code for another amino acid with similar chemical and
> physical properties? Would it be acceptable to change it to the extremely
> rare but still 'legal' character 'Z' for glutamine? Any other suggestions?
>>> -- Gordon D. Pusch
>> perl -e '$_ = "gdpusch\@NO.xnet.SPAM.com\n"; s/NO\.//; s/SPAM\.//; print;'
>>