alternative matrix for use with GCG version of TFASTA

2020000%LAVALVX1.bitnet at CUNYVM.CUNY.EDU 2020000%LAVALVX1.bitnet at CUNYVM.CUNY.EDU
Fri Oct 21 14:21:53 EST 1994

Dear Info-GCG readers:                                                          
     I would like to offer for your evaluation an alternative matrix for        
use with the program TFASTA.  One problem with TFASTA has been the (too)        
numerous alignments with obviously closed reading frames, sometimes             
resulting in significant alignments being pushed down into 100th or 150th       
place, where few users look.  This is especially the case with protein          
families with only short motifs and generally weak similarities, such as        
transmembrane transport proteins or phage-type integrases.  Also, proteins      
rich in certain amino acids such as tryptophan can result in alignments         
to translations like XWWVWX with "100% similarity in 4 aa overlap". This        
is a direct consequence of the scoring matrix used.  We have been using,        
to the satisfaction of our users, an alternative matrix in which a penalty      
of -9 is applied to an alignment to a stop codon and in which the scores        
for certain identities, e.g. Trp-Trp and Cys-Cys are reduced, since their       
codons occur often in closed reading frames.  The result is to bring            
several significant alignments up out of the "noise", and the INITN scores      
of significant alignments are changed by only a point or two, depending on      
whether the query protein or the database "protein" is shorter. Alignments      
to CRF's obtain substantially lower scores.                                     
     In GCG version 7 it was necessary to alter the file FASTA_UPAM.H and       
then recompile the program.  In GCG version 8 TFASTA permits a command          
line switch, e.g.      -DATA=tfastapep.cmp     Since the file is too wide       
for some E-mail systems, it has been made obtainable by anonymous FTP to:       
The file is called     tfastapep.cmp                                            
and is in the directory    /contrib/gcg                                         
     Try it, you'll like it.  I would appreciate comments or suggested          
                                       Paul H. Roy                              
                                       Departement de biochimie,FSG             
                                       Universite Laval                         
                                       Quebec, QC  G1K 7P4                      
                                       Internet: 2020000 at SAPHIR.ULAVAL.CA       
                                             or: PROY at RSVS.ULAVAL.CA            
                                       Bitnet: 2020000 at LAVALVX1                 
                                       Phone: +1 418 654 2705                   
                                       FAX:   +1 418 654 2715                   

More information about the Info-gcg mailing list

Send comments to us at biosci-help [At] net.bio.net