In response to Tonu Margus's question, the following script works with
GENBANK flat files that have the qualifier "CDS" in the features list. I
wrote this script to extract the translational start and stop points from a
genbank file, and translate the DNA sequence from that file into amino acid
sequence.
#!/bin/csh -f
#
# script: translate.fil author: Joseph A. White 10/96
#
# Script to extract the translational start and stop sites from DNA
# sequence file, and translate the file, giving it a filename extension of
# .pep.
#
set begin = ` grep 'CDS' $file | cut -c22-40 | cut -f1 -d"." `
set end = ` grep 'CDS' $file | cut -c22-40 | cut -f2 -d" " | cut -c2-7 `
echo $begin, $end
translate $file -default -beg=$begin -end=$end -out=$file.pep
To use the script, enter the following command:
translate.fil <filename>
The script works with a single sequence file name. It will produce a file
with the extension ".pep" .
There are two problems that could occur in using this script:
1. If the coding sequence ("CDS") is listed as a series of joined exon
coding parts of a sequence file, the script will fail to translate the
correct parts of the sequence.
2. If the start and stop base pair numbers are not in columns 22-40 of
the line containing "CDS", the script will incorrectly translate the DNA
sequence.
The second problem can be fixed by altering the columns which CUT uses to
extract information. The first problem requires much more work to make this
script useful.
A script that will accept a file of sequence names, and translate each file
is shown below.
#!/bin/csh -f
#
# script: translate.seqfiles author: Joseph A. White 10/96
#
# Script to extract the translational start and stop sites from a group of DNA
# sequence files, and translates each file, giving it a filename extension of
# .pep.
#
foreach file (`cat $1`)
echo $file
set begin = ` grep 'CDS' $file | cut -c22-40 | cut -f1 -d"." `
set end = ` grep 'CDS' $file | cut -c22-40 | cut -f2 -d" " | cut -c2-7 `
echo $begin, $end
translate $file -default -beg=$begin -end=$end -out=$file.pep
set lines = `grep -c '*' $file.pep`
if ($lines > 1) then
echo $file.pep $lines >> check.pep
echo "$file has not been translated properly."
else
echo $file.pep >> tfiles.pep
endif
end
To use the script, enter the following command:
translate.seqfiles <file_of_ssequence_names>
The script works with a file of sequence names or with a single sequence
file name. It will produce a series of files with the extension ".pep" .
It also produces a file containing a list of the sequences it has
translated. The script will produce a file called "check.pep" if it finds
that any file has been incorrectly translated, i.e. it
detects stop codons within the translated sequence.
The script is prone to the same problems that the first has.
Joe White
At 12:30 PM 5/7/98 GMT, you wrote:
>Hi,
>Is there program in GCG or EGCG what can extract protein seq
>from NH seq annotation?
>If yes in EGCG then from wher can I downloud it?
>>Tonu Margus
>>Joe White
e-mail: joe.white at wmich.edu
snailmail: Dept. of Chemistry
Western Michigan University
Kalamazoo, MI 49008
phone: (616) 387-2895
fax: (616) 387-2909