In message <9111200739.AA07044 at genbank.bio.net> posted to proteins
Michael Clarke asked:
> Can anyone suggest how I might go about determining the amino acid
> composition of a group of related proteins?
In reply to an earlier question posted by him to genbank-bb I had said
> These data are obtainable from the [PIR] PSQ program by using the command
> USAGE/CURRENT/BRIEF
> after invoking the appropriate database, e.g.
> PSQ PIR1
> Similar compositional frequencies for selected subsets of sequences can be
> obtained after first using the appropriate FIND command.
The PSQ program has several commands useful for selecting appropriately
related entries. Once a set of entries is selected the USAGE command above
will produce the composition table. For example
FIND HEMOGLOBIN
USAGE/CURRENT/BRIEF
produces the following composition table for the 341 hemoglobin entries in
the PIR1 database.
Cumulative frequencies from 341 entries 48939 residues
5584 11.4% Ala A 2196 4.5% Glu E 536 1.1% Met M 1077 2.2% Tyr Y
1238 2.5% Arg R 3261 6.7% Gly G 2670 5.5% Phe F 4618 9.4% Val V
1821 3.7% Asn N 2875 5.9% His H 1765 3.6% Pro P 61 0.1% Asx B
2638 5.4% Asp D 1032 2.1% Ile I 3034 6.2% Ser S 39 0.1% Glx Z
563 1.2% Cys C 5812 11.9% Leu L 2412 4.9% Thr T
1220 2.5% Gln Q 3887 7.9% Lys K 600 1.2% Trp W
He had earlier asked for and received composition tables for the PIR and
SWISS-PROT databases. Of possible interest is the composition table for
the protein sequences from the Brookhaven Protein Data Bank in the PIR
NRL_3D database.
Cumulative frequencies from 1045 entries 177811 residues
14991 8.4% Ala A 8733 4.9% Glu E 3268 1.8% Met M 6210 3.5% Tyr Y
6845 3.8% Arg R 14740 8.3% Gly G 6563 3.7% Phe F 12693 7.1% Val V
8625 4.9% Asn N 4036 2.3% His H 7741 4.4% Pro P 32 0.0% Asx B
9831 5.5% Asp D 9236 5.2% Ile I 13280 7.5% Ser S 14 0.0% Glx Z
3607 2.0% Cys C 14209 8.0% Leu L 11325 6.4% Thr T 2809 1.6% X X
6167 3.5% Gln Q 10299 5.8% Lys K 2557 1.4% Trp W
------------------------------------------------------------------------
Dr. John S. Garavelli
Database Coordinator
Protein Identification Resource
National Biomedical Research Foundation
Washington, DC 20007
POSTMASTER at GUNBRF.BITNET