perl extraction of protein sequence
Nobuyuki Miyajima
miyajima at kazusa.or.jp
Mon Jul 7 09:09:50 EST 1997
From: wrp at alpha0.bioch.virginia.edu (William R. Pearson)
Subject: perl extraction of protein sequence
Date: Fri, 4 Jul 1997 15:25:06 GMT
>
> I am looking for a perl script to read a Genbank file and build
> a file of the translation products.
>
> Bill Pearson
>
Please try this.
Nobuyuki Miyajima
-----------------------------------------------
Kazusa DNA Research Institute
Department of Genome Informatics,
Chief Researcher
1532-3 Yana, Kisarazu, Chiba 292, Japan
TEL: +81-438-52-3917 FAX: +81-438-52-3918
E-mail: miyajima at kazusa.or.jp
======================================================
#!/usr/local/bin/perl
$infile = shift;
$file = $infile;
$file =~ s#\.gb_pr##; # GCG
$file =~ tr#a-z#A-Z#;
$word = "";
$counter = 0;
open(IN,$infile) || die "Cannot open $infile";
while(<IN>){
chomp;
s#^(\s|\t| )*##;
m#^CDS# && ($check = 1);
if($check == 1){
if(m#^\/translation=\"(.+)#){
$word .= $1;
$check = 2;
if($word =~ m#\"$#){
$check = 0;
$word =~ s#\"$##;
}
}
else{ next; }
}
elsif($check == 2){
$word .= $_;
if($word =~ m#\"$#){
$check = 0;
$word =~ s#\"$##;
}
}
else{ next; }
}
close(IN);
print ">$file\n";
@chars = split(//,$word);
foreach $char (@chars){
print "$char";
$counter++;
if($counter == 70){
$counter = 0;
print "\n";
}
}
($counter != 0) && (print "\n");
More information about the Bio-soft
mailing list
Send comments to us at biosci-help [At] net.bio.net