Hi Netters,
I was wondering whether it is possible to translate a set of N large DNA
sequence files (say, a prokaryotic genome) into all 6 reading frames in one
command line. I have the DNA sequence files in GCG format and a file with
the name of these files.
The rationale for doing this is I feel the sequence annotations which I think
are used in the trembl protein database (which takes some time to update
anyway) might miss some subtle things such as translational coupling,
frameshifts and, yes, sequencing errors introducing stop codons.
What I would do next is build a dataset with these 6N protein translations
with something like (unix GCG)
>dataset @6Nprotein.list{*}
(I think I know how to do that) and then use this dataset for doing fasta
searches against it.
Thanks for any input,
Stephane
-----------------------------------------------------------------
Stephane Vuilleumier
Mikrobiologisches Institut
ETH-Zurich Tel: (+41) 1 632 33 57
ETH-Zentrum/LFV Fax: (+41) 1 632 11 48
8092 Zurich email: svuilleu at micro.biol.ethz.ch
Switzerland http://www.micro.biol.ethz.ch/sv1.htm
-----------------------------------------------------------------