Harold Drabkin (hdrabkin at mit.edu) wrote:
: Does anyone know of a program that could search the standard sequence
: databases of protein sequences for a particular di or tri peptide
: sequence, restricting the search to just the N-terminal? For example,
: how many proteins begin with V-D, or M-V-D?
if you have fasta libraries and a machine which runs perl,
save this as nterm.pl and give it a go
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
#!/usr/local/bin/perl
$usage = "usage: nterm.pl fasta_library pattern\n";
# this program reads a *.fasta library file
# and spits out the IDs of sequences beginning
# with the desired pattern
$database = shift || die $usage;
$pattern = shift || die $usage;
open(DB, $database) || die "can't open $database";
while (<DB>)
{
if (/^\>/)
{
$last_id = $_;
$n=0;
}
print $last_id if ($n==1 && /^$pattern/);
$n++;
}
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
or you could add up how many hits and divide by the total
or whatever... I'll run a couple of queries if you just want
a quick answer.
--
++++++++++++++++++++++++++++++ Bob MacCallum ++++++++++++++++++++++++++++++
+++++++++++++++ Biomolecular Structure and Modelling Group ++++++++++++++++
++++++++++++ Department of Biochemistry and Molecular Biology +++++++++++++
++++++++++++++++ University College London, WC1E 6BT, UK ++++++++++++++++++