Search restriction to N-terminal?

Bob MacCallum bob at bsm.bioc.ucl.ac.uk
Thu Nov 28 12:50:16 EST 1996

Harold Drabkin (hdrabkin at mit.edu) wrote:
: Does anyone know of a program that could search the standard sequence
: databases of protein sequences for a particular di or tri peptide
: sequence, restricting the search to just the N-terminal? For example,
: how many proteins begin with V-D, or M-V-D?

if you have fasta libraries and a machine which runs perl, 
save this as nterm.pl and give it a go


$usage = "usage: nterm.pl fasta_library pattern\n";

# this program reads a *.fasta library file
# and spits out the IDs of sequences beginning
# with the desired pattern

$database = shift || die $usage;
$pattern = shift || die $usage;

open(DB, $database) || die "can't open $database";
while (<DB>)
  if (/^\>/)
    $last_id = $_;
  print $last_id if ($n==1 && /^$pattern/);

or you could add up how many hits and divide by the total
or whatever...  I'll run a couple of queries if you just want
a quick answer.

