Rich Dudley asked:
> Doe anyone know of a program (WW or Windows) that can enumerate the di-
> and tri-peptide frequencies in a protein? Ideally, it would contstruct
> a table at the end of the input and have the sequence and number of
> occurrences.
>
This isn't any help, but I figured may code was hard enough to
understand that I would post it anyway <grin>.
Here's a perl script for dipeptide pair counts, assuming single
letter sequences on one line per record.
perl -ne '%dict={};
s/(..)/$dict{$1}++,$1/ge;$_=substr($_,-(length)+1);
s/(..)/$dict{$1}++,$1/ge;
foreach $k (keys %dict) {print "$k $dict{$k}\n"}'
ANAANOPOANO
OA 1
AA 1
NO 2
OP 1
NA 1
PO 1
AN 3
(Yeah! And "O" is the 21st Beatle^H^H^H^H^H^Hamino acid :)
For tripeptides that's:
perl -ne '%dict={};
s/(...)/$dict{$1}++,$1/ge;$_=substr($_,-(length)+1);
s/(...)/$dict{$1}++,$1/ge;$_=substr($_,-(length)+1);
s/(...)/$dict{$1}++/ge;
foreach $k (keys %dict) {print "$k $dict{$k}\n"}'
ANANAPANA
ANA 3
APA 1
NAN 1
PAN 1
NAP 1
Intuitively obvious to the most causual of observers, yes?
Andrew