How to handle selenocystiene in alignments ???

Michael Black drmbb at mac.com
Tue Apr 22 03:47:21 EST 2003

I don't know of any matrices (certainly not the PAM, BLOSUM, GONNET nor MD
series that are commonly used) that include weights for "U".  I can tell you
that BLAST (May 17/02 - NCBI's "What's New" announcement) does exactly as
you suggest, treat the symbol "U" as an "X" in the input data.

Michael Black

On 4/15/03 5:51 AM, in article pgpmoose.200304151051.19454 at net.bio.net,
"Gordon D. Pusch" <gdpusch at NO.xnet.SPAM.com> wrote:

> I have recently found evidence that BLAST and FASTA do not properly handle
> the official IUPAC single-letter-code 'U' for selenocystiene, presumably
> because it does not appear in either the PAM or BLOSUM matrices (although
> I have not been able to rule out hard-coding as a cause).
> Are substitution matrices available that include scores for selenocystiene?
> If not, what is the least harmful way of handling the selenocystiene
> character?
> Should it be changed to the code 'X' for an unknown amino acid?  Or should
> it be changed to the code for another amino acid with similar chemical and
> physical properties?  Would it be acceptable to change it to the extremely
> rare but still 'legal' character 'Z' for glutamine?  Any other suggestions?
> -- Gordon D. Pusch
> perl -e '$_ = "gdpusch\@NO.xnet.SPAM.com\n"; s/NO\.//; s/SPAM\.//; print;'

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net