6-frame translation of a list of (large) DNA sequences in one go?

Paul Roy proy at rsvs.ulaval.ca
Wed Nov 20 09:30:12 EST 1996

On Wed, 20 Nov 1996, Stephane Vuilleumier wrote:
> I was wondering whether it is possible to translate a set of N  large DNA 
> sequence files (say, a prokaryotic genome)  into all 6 reading frames in one 
> command line.  I have the DNA sequence files in GCG format  and a file with 
> the name of these files.
> The rationale for doing this is I feel the sequence annotations which I think 
> are used in the trembl protein database (which takes some time to update 
> anyway) might miss some subtle things such as translational coupling, 
> frameshifts and, yes, sequencing errors introducing stop codons.
> What I would do next is build a dataset with these 6N protein translations

(stuff deleted)

     Why not just use TFASTA which does essentially the same thing -
"on-the-fly" translations of all 6 frames and protein-protein
comparisons.  The difference is that TFASTA doesn't need, nor does it
save, the translations in a database.
     I have found that  fastapep.cmp  doesn't work very well for TFASTA -
there are lots of alignments to obviously closed reading frames.  However,
providing a penalty to alignments with a stop codon, and reducing the
score for certain matches, e.g. Cys-Cys and Trp-Trp, works much better and
brings significant alignments up out of the "noise".  Indeed TFASTA  does
show sequencing errors in real reading frames - this will show up as the
same file listed twice, in two different reading frames.


 Paul H. Roy                             Phone:  +1 418 654 2705
 Departement de biochimie,FSG            FAX:    +1 418 654 2715
 Universite Laval                        E-mail: proy at rsvs.ulaval.ca
 Quebec, QC  G1K 7P4


More information about the Info-gcg mailing list

Send comments to us at biosci-help [At] net.bio.net