Dear Info-GCG readers:
I would like to offer for your evaluation an alternative matrix for
use with the program TFASTA. One problem with TFASTA has been the (too)
numerous alignments with obviously closed reading frames, sometimes
resulting in significant alignments being pushed down into 100th or 150th
place, where few users look. This is especially the case with protein
families with only short motifs and generally weak similarities, such as
transmembrane transport proteins or phage-type integrases. Also, proteins
rich in certain amino acids such as tryptophan can result in alignments
to translations like XWWVWX with "100% similarity in 4 aa overlap". This
is a direct consequence of the scoring matrix used. We have been using,
to the satisfaction of our users, an alternative matrix in which a penalty
of -9 is applied to an alignment to a stop codon and in which the scores
for certain identities, e.g. Trp-Trp and Cys-Cys are reduced, since their
codons occur often in closed reading frames. The result is to bring
several significant alignments up out of the "noise", and the INITN scores
of significant alignments are changed by only a point or two, depending on
whether the query protein or the database "protein" is shorter. Alignments
to CRF's obtain substantially lower scores.
In GCG version 7 it was necessary to alter the file FASTA_UPAM.H and
then recompile the program. In GCG version 8 TFASTA permits a command
line switch, e.g. -DATA=tfastapep.cmp Since the file is too wide
for some E-mail systems, it has been made obtainable by anonymous FTP to:
ftp.ulaval.ca
The file is called tfastapep.cmp
and is in the directory /contrib/gcg
Try it, you'll like it. I would appreciate comments or suggested
improvements.
Paul H. Roy
Departement de biochimie,FSG
Universite Laval
Quebec, QC G1K 7P4
CANADA
Internet: 2020000 at SAPHIR.ULAVAL.CA
or: PROY at RSVS.ULAVAL.CA
Bitnet: 2020000 at LAVALVX1
Phone: +1 418 654 2705
FAX: +1 418 654 2715
QUIT