IUBio

AAA compositions (sic)

POSTMAST at GUNBRF.bitnet POSTMAST at GUNBRF.bitnet
Wed Nov 20 17:38:00 EST 1991


In message <9111200739.AA07044 at genbank.bio.net> posted to proteins
Michael Clarke asked:
> Can anyone suggest how I might go about determining the amino acid
> composition of a group of related proteins?

In reply to an earlier question posted by him to genbank-bb I had said
> These data are obtainable from the [PIR] PSQ program by using the command
>   USAGE/CURRENT/BRIEF
> after invoking the appropriate database, e.g.
>   PSQ PIR1
> Similar compositional frequencies for selected subsets of sequences can be
> obtained after first using the appropriate FIND command.

The PSQ program has several commands useful for selecting appropriately
related entries.  Once a set of entries is selected the USAGE command above
will produce the composition table.  For example
  FIND HEMOGLOBIN
  USAGE/CURRENT/BRIEF
produces the following composition table for the 341 hemoglobin entries in
the PIR1 database.

Cumulative frequencies from 341 entries 48939 residues
  5584 11.4% Ala A  2196  4.5% Glu E   536  1.1% Met M  1077  2.2% Tyr Y
  1238  2.5% Arg R  3261  6.7% Gly G  2670  5.5% Phe F  4618  9.4% Val V
  1821  3.7% Asn N  2875  5.9% His H  1765  3.6% Pro P    61  0.1% Asx B
  2638  5.4% Asp D  1032  2.1% Ile I  3034  6.2% Ser S    39  0.1% Glx Z
   563  1.2% Cys C  5812 11.9% Leu L  2412  4.9% Thr T
  1220  2.5% Gln Q  3887  7.9% Lys K   600  1.2% Trp W

He had earlier asked for and received composition tables for the PIR and
SWISS-PROT databases.  Of possible interest is the composition table for
the protein sequences from the Brookhaven Protein Data Bank in the PIR
NRL_3D database.

Cumulative frequencies from 1045 entries 177811 residues
 14991  8.4% Ala A  8733  4.9% Glu E  3268  1.8% Met M  6210  3.5% Tyr Y
  6845  3.8% Arg R 14740  8.3% Gly G  6563  3.7% Phe F 12693  7.1% Val V
  8625  4.9% Asn N  4036  2.3% His H  7741  4.4% Pro P    32  0.0% Asx B
  9831  5.5% Asp D  9236  5.2% Ile I 13280  7.5% Ser S    14  0.0% Glx Z
  3607  2.0% Cys C 14209  8.0% Leu L 11325  6.4% Thr T  2809  1.6%  X  X
  6167  3.5% Gln Q 10299  5.8% Lys K  2557  1.4% Trp W

------------------------------------------------------------------------
                                 Dr. John S. Garavelli
                                 Database Coordinator
                                 Protein Identification Resource
                                 National Biomedical Research Foundation
                                 Washington, DC  20007
                                 POSTMASTER at GUNBRF.BITNET



More information about the Proteins mailing list

Send comments to us at biosci-help [At] net.bio.net