IUBio

perl extraction of protein sequence

Nobuyuki Miyajima miyajima at kazusa.or.jp
Mon Jul 7 09:09:50 EST 1997


From: wrp at alpha0.bioch.virginia.edu (William R. Pearson)
Subject: perl extraction of protein sequence
Date: Fri, 4 Jul 1997 15:25:06 GMT

> 
> I am looking for a perl script to read a Genbank file and build
> a file of the translation products.
> 
> Bill Pearson
> 

Please try this.
Nobuyuki Miyajima
-----------------------------------------------
Kazusa DNA Research Institute
Department of Genome Informatics,
Chief Researcher
1532-3 Yana, Kisarazu, Chiba 292, Japan
TEL:  +81-438-52-3917 FAX:  +81-438-52-3918
E-mail:  miyajima at kazusa.or.jp
======================================================
#!/usr/local/bin/perl

$infile = shift;
$file = $infile;
$file =~ s#\.gb_pr##;       # GCG
$file =~ tr#a-z#A-Z#;

$word = "";
$counter = 0;

open(IN,$infile) || die "Cannot open $infile";
while(<IN>){
  chomp;
  s#^(\s|\t| )*##;
  m#^CDS# && ($check = 1);
  if($check == 1){
    if(m#^\/translation=\"(.+)#){
      $word .= $1;
      $check = 2;
      if($word =~ m#\"$#){
	$check = 0;
	$word =~ s#\"$##;
      }
    }
    else{ next; }
  }
  elsif($check == 2){
    $word .= $_;
    if($word =~ m#\"$#){
      $check = 0;
      $word =~ s#\"$##;
    }
  }
  else{ next; }
}
close(IN);
    
print ">$file\n";
@chars = split(//,$word);
foreach $char (@chars){
  print "$char";
  $counter++;
  if($counter == 70){
    $counter = 0;
    print "\n";
  }
}
($counter != 0) && (print "\n");




More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net