SWISS-PROT RELEASE 27.0 RELEASE NOTES
1. INTRODUCTION
1.1 Evolution
Release 27.0 of SWISS-PROT contains 33329 sequence entries, comprising
11'484'420 amino acids abstracted from 32314 references. This represents
an increase of 5.6% over release 26. The recent growth of the data bank
is summarized below.
Release Date Number of entries Nb of amino acids
3.0 11/86 4160 969 641
4.0 04/87 4387 1 036 010
5.0 09/87 5205 1 327 683
6.0 01/88 6102 1 653 982
7.0 04/88 6821 1 885 771
8.0 08/88 7724 2 224 465
9.0 11/88 8702 2 498 140
10.0 03/89 10008 2 952 613
11.0 07/89 10856 3 265 966
12.0 10/89 12305 3 797 482
13.0 01/90 13837 4 347 336
14.0 04/90 15409 4 914 264
15.0 08/90 16941 5 486 399
16.0 11/90 18364 5 986 949
17.0 02/91 20024 6 524 504
18.0 05/91 20772 6 792 034
19.0 08/91 21795 7 173 785
20.0 11/91 22654 7 500 130
21.0 03/92 23742 7 866 596
22.0 05/92 25044 8 375 696
23.0 08/92 26706 9 011 391
24.0 12/92 28154 9 545 427
25.0 04/93 29955 10 214 020
26.0 07/93 31808 10 875 091
27.0 10/93 33329 11 484 420
1.2 Source of data
Release 27.0 has been updated using protein sequence data from release
37.0 of the PIR (Protein Identification Resource) protein data bank, as
well as translation of nucleotide sequence data from release 36.0 of the
EMBL Nucleotide Sequence Database.
As an indication to the source of the sequence data in the SWISS-PROT
data bank we list here the statistics concerning the DR (Database cross-
references) pointer lines:
Entries with pointer(s) to only PIR entri(es): 4553
Entries with pointer(s) to only EMBL entri(es): 4557
Entries with pointer(s) to both EMBL and PIR entri(es): 23557
Entries with no pointers lines: 662
<PAGE>
2. DESCRIPTION OF THE CHANGES MADE TO SWISS-PROT SINCE RELEASE 26
2.1 Sequences and annotations
About 1532 sequences have been added since release 26, the sequence data
of 213 existing entries has been updated and the annotations of 3000
entries have been revised. In particular we have used reviews articles
to update the annotations of the following groups or families of
proteins:
- Aspartate and glutamate racemases
- Bacteriophage T4 proteins
- Band 3 anion proteins
- Beta amylases
- Cysteine synthases
- Deoxyribonuclease I
- Epenymins
- Epimorphin family
- GTP1/Obg family
- Fork head domain proteins
- G-linked receptors family 2
- Glutamate 5-kinase
- HIT family
- Lysyl oxidases
- mutT domain proteins
- Nitrilases / cyanide hydratase
- Peripherin / rom-1
- Phosphatidylinositol 3-kinases
- Pollen proteins Ole e I family
- Prokaryotic transglycosylases
- Protein prenyltransferases alpha subunit repeat
- Renal dipeptidases
- Trehalases
2.2 A special emphasis on four "model" organisms
We have selected four organisms that are the target of genome sequencing
and/or mapping projects and for which we intend to:
- Be as complete as possible. All sequences available at a given time
should be immediatly included in SWISS-PROT. This also includes
sequence corrections and updates.
- Provide a high level of annotations.
- Cross-references to specialized database(s) that contain, among other
data, some genetic information about the genes that code for these
proteins.
- Provide specific indices or documents.
<PAGE>
The four organisms selected are:
o Caenorhabditis elegans (worm)
o Drosophila melanogaster (fly)
o Escherichia coli
o Saccharomyces cerevisiae (yeast)
Such a special effort has been going on for more than a year for
Escherichia coli (thanks to a very fruitful collaboration with Ken
Rudd), it has started in this release for yeast: about 300 new yeast
sequences were entered; about 1500 entries were reannotated and a new
document (YEAST.TXT) is provided that list yeast entries in SWISS-PROT
classified by gene name and synonym(s). The next release will target
C.elegans thanks to a collaboration with the group at the Sanger Genome
Center in Hinxton (UK). The Drosophila "project" should start a bit
later.
Organism Database Index file
X-referenced provided
-------------- ---------------------- --------------
C.elegans WormPep Next release
D.melanogaster Flybase In preparation
E.coli EcoGene ECOLI.TXT
S.cerevisiae LISTA (in preparation) YEAST.TXT
2.3 The Expasy World-Wide Web server
The recent months have seen a tremendous increase in the availability of
software tools and applications that allow to efficiently make use of
the varied resources which are part of the Internet network. Three of
these `Network Information Tools' (NIR) are widely used by the
biological community: WAIS (Wide Area Information Server), Gopher and,
more recently, the World-Wide Web (WWW). As many organizations provide
WAIS or Gopher servers that offer access to SWISS-PROT and PROSITE, we
felt that there was no need to set up ourselves such a service. But no
such server was yet available for WWW.
The World-Wide Web (WWW), which originated at CERN, is a powerful global
information system merging networked information retrieval and
hypertext. It gives access, using hypertext links, to the documents and
information contained in all the existing WWW servers around the world,
as well as to the data obtainable through other information retrieval
systems like WAIS, Gopher, X500, etc. To access a WWW server, one has to
run on a local computer a client program (a WWW browser), which displays
hypertext documents. The user can then either request a keyword search
or jump to another document by following a hypertext link. WWW has the
outstanding advantage of extending the hypertext model to the whole
world (by allowing hypertext jumps to documents anywhere on the internet
network) and by being device and user-interface independent (browsers
exist for a variety of computers and user-interfaces, including Unix
workstations running XWindows, MacIntoshes and PCs with Microsoft
Windows).
<PAGE>
A WWW server has been set up by Ron Appel from the group of Denis
Hochstrasser at the Faculty of Medicine of the University of Geneva on
the ExPASy molecular biology server. It allows access, using the user-
friendly hypertext model, to the SWISS-PROT and SWISS-2DPAGE databases
and, through any SWISS-PROT protein sequence entry, to other databases
such as EMBL, PROSITE, REBASE, Flybase, PDB and OMIM. Using a browser
which is able to display images one can also remotely access 2D gels
image data from SWISS-2DPAGE.
A WWW server can be accessed on the internet through its Uniform
Resource Locator (URL), the addressing system defined by the WWW model.
The URL for the ExPASy molecular biology WWW server is:
http://expasy.hcuge.ch/
or
http://129.195.254.61/
To access a WWW server, you need to run a browser (or client) program on
your local computer. Browsers exist for a variety of machines and may be
obtained by anonymous ftp. Here is a selected list (taken from the CERN
WWW server) of currently available browsers and the ftp address from
which they can be retrieved:
NCSA Mosaic a very flexible and powerful browser with a graphical
user interface. Available for Unix boxes using X11/Motif;
for Mc Intoshes and for Microsoft Windows. FTP site:
ftp.ncsa.uiuc.edu (in /Web/xmosaic).
lynx a full screen browser for vt100s using full screen, arrow
keys, highlighting, etc. FTP site: ftp2.cc.ukans.edu (in
/pub/lynx).
www a basic line mode browser giving access to WWW from any
dumb terminal. FTP site: info.cern.ch (in /pub/www).
To access all the data available from SWISS-2DPAGE, the user's local
computer needs to run an image viewing program. For most browsers on
Unix workstations the default program is xv, a shareware application
developed by John Bradley at University of Pennsylvania. The program can
be found by ftp at export.lcs.mit.edu (in /contrib).
For more information on the ExPASy WWW server, please contact Dr. Ron
Appel:
Email: appel at cih.hcuge.ch
Tel: +41-22-372 62 64
Fax: +41-22-372 61 98
<PAGE>
2.4 Changes in the DR line
We have added cross-references to the WormPep collection of candidate
protein translations from the Caenorhabditis elegans genome sequencing
project (see section 2.2 of these notes). These cross-references are
present in the DR lines:
Data bank identifier: WORMPEP
Primary identifier : Cosmid-derived name given to that protein by the
C.elegans genome sequencing project. In general
this name will not change.
Secondary identifier: Number attributed by the C.elegans genome
sequencing project to that protein. This number
will change when the sequence is updated.
Example : DR WORMPEP; ZK637.7; CE00437.
2.5 Weekly updates of SWISS-PROT
Since release 24, we provide weekly updates of SWISS-PROT. These updates
are available by anonymous FTP. Three files are updated every week:
new_seq.dat Contains all the new entries since the last full release.
upd_seq.dat Contains the entries for which the sequence data has been
updated since the last release.
upd_ann.dat Contains the entries for which one or more annotation
fields have been updated since the last release.
Currently these files are available on the following anonymous ftp
servers:
Organism EMBL ftp server
Address ftp.embl-heidelberg.de (or 192.54.41.33)
Directory /pub/databases/swissprot/new
Organism ExPASy (Geneva University Expert Protein Analysis System)
Address expasy.hcuge.ch (or 129.195.254.61)
Directory /databases/swiss-prot/updates
Organism National Center for Biotechnology Information (NCBI)
Address ncbi.nlm.nih.gov (or 130.14.20.1)
Directory /repository/swiss-prot/updates
!! Important notes !!!
Although we try to follow a regular schedule, we do not promise to
update these files every week. In some cases two weeks will elapse in-
between two updates.
Due to the current mechanism used to build a release the entries that
are provided in these updates are not guaranteed to be error free. Also,
for the same reason, new entries do not contain an OC (Organism
Classification) line.
<PAGE>
3. ENZYME AND PROSITE
3.1 The ENZYME data bank
Release 14.0 of the ENZYME data bank is distributed with release 27 of
SWISS-PROT. ENZYME release 14.0 contains information relative to 3489
enzymes.
3.2 The PROSITE data bank
3.2.1 What's new in release 11.0
Release 11.0 of the PROSITE data bank is distributed with release 27 of
SWISS-PROT. Release 11.0 contains 715 documentation chapters that
describes 926 different patterns. Since the last major release of
PROSITE (release 10.00 of December 1992), 80 new chapters have been
added and 306 chapters have been updated. The new chapters are listed
below:
- Protein splicing signature
- CAP-Gly domain signature
- MAM domain signature
- Prokaryotic transcription elongation factors signatures
- MCM2/3/5 family signature
- XPGC protein signatures
- Bacterial regulatory proteins, arsR family signature
- Bacterial regulatory proteins, deoR family signature
- Dps protein family signatures
- Ribosomal protein L27 signature
- Ribosomal protein L36 signature
- mutT domain signature
- 3-hydroxyisobutyrate dehydrogenase signature
- Dihydroorotate dehydrogenase signatures
- Alanine dehydrogenase and pyridine nucleotide transhydrogenase
- Lysyl oxidase putative copper-binding region signature
- 6-hydroxy-D-nicotine oxidase and reticuline oxidase FAD-binding
- Cytochrome c oxidase subunit VB, zinc binding region signature
- Indoleamine 2,3-dioxygenase signatures
- Glycine radical signature
- Uroporphyrin-III C-methyltransferase signatures
- Protein prenyltransferases alpha subunit repeat signature
- Phosphatidylinositol 3-kinase signatures
- Glutamate 5-kinase signature
- Guanylate kinase signature
- ADP-glucose pyrophosphorylase signatures
- 2'-5'-oligoadenylate synthetases signatures
- Deoxyribonuclease I signatures
- Glucoamylase active site region signature
<PAGE>
- Trehalase signatures
- Glycosyl hydrolases family 8 signature
- Prokaryotic transglycosylases signature
- Renal dipeptidase active site
- Serine proteases, ompT family signatures
- Proteasome B-type subunits signature
- Signal peptidases II signature
- Cytidine & deoxycytidylate deaminases zinc-binding region signature
- GTP cyclohydrolase I signatures
- Nitrilases / cyanide hydratase signatures
- Orn/DAP/Arg decarboxylases family 2 signatures
- Uroporphyrinogen decarboxylase signatures
- Alpha-isopropylmalate and homocitrate synthases signatures
- Beta-eliminating lyases pyridoxal-phosphate attachment site
- Dihydroxy-acid and 6-phosphogluconate dehydratases signatures
- Prephenate dehydratase signatures
- Cysteine synthase pyridoxal-phosphate attachment site
- Cys/Met metabolism enzymes pyridoxal-phosphate attachment site
- Cytochrome c and c1 heme lyases signatures
- Aspartate and glutamate racemases signatures
- Mandelate racemase / muconate lactonizing enzyme family signatures
- Phosphoglucomutase and phosphomannomutase phosphoserine signature
- D-alanine--D-alanine ligase signatures
- Carbamoyl-phosphate synthase subdomain signatures
- Nickel-dependent hydrogenases b-type cytochrome subunit signatures
- Adrenodoxin family, iron-sulfur binding region signature
- ABC-2 type transport system integral membrane proteins signature
- Acyl-CoA-binding protein signature
- LacY family proton/sugar symporters signatures
- Sodium:alanine symporter family signature
- Sodium:galactoside symporter family signature
- Osteopontin signature
- Peripherin / rom-1 signature
- Interleukins -4 and -13 signature
- Erythropoietin signature
- Galanin signature
- Chaperonins clpA/B signatures
- Bacterial type II secretion system protein D signature
- Bacterial type II secretion system protein F signature
- MARCKS family signatures
- Elongation factor 1 beta/beta'/delta chain signatures
- Eukaryotic initiation factor 4E signature
- Calsequestrin signatures
- GTP1/OBG family signature
- HIT family signature
- Ependymins signatures
- Epimorphin family signature
- Yeast PIR proteins repeats signature
- Oleosins signature
- Pollen proteins Ole e I family signature
- Hypothetical YCR59c/yigZ family signature
<PAGE>
3.2.2 Future developments
Starting with the next major releases (12.0 of May 1994), PROSITE will
be extended to include weight matrices (also known as profiles). There
are a number of protein families as well as functional or structural
domains that cannot be detected using patterns due to their extreme
sequence divergence. Typical examples of important functional domains
which are weakly conserved are the immunoglobulin domains, the SH2 and
SH3 domains, or the fibronectin type III domain. In such domains there
are only a few sequence positions which are well conserved. Any attempt
of building a consensus pattern for such regions will either fail to
pick up a significant proportion of the protein sequences that contain
such region (false negative) or will pick up too many proteins that do
not contain the region (false positive). The use of technique based on
weight matrices or profiles allows the detection of such proteins or
domains. Dr. Philipp Bucher at ISREC in Lausanne and myself are
collaborating to include such methods into PROSITE. This collaboration
also includes other participants such as Roland Luethy (AMGEN), Michael
Gribskov (SDSC) and Steve Altschul (NCBI). If you are interested in
participating in this project please contact Philipp Bucher at:
pbucher at isrec-sun1.unil.ch
We will include in the next release note of SWISS-PROT (Release 28 of
February 1994) a brief description of the PROSITE syntax extension for
profiles. The full description will be available in the User's Manual of
release 11.1 of PROSITE (February 1994).
Important notice for software developers: the integration of profiles
into PROSITE will not "break" the current format. The profiles entries
in the PROFILE.DAT file will be tagged with the token "MATRIX" on the
"ID" line (currently, only "PATTERN" and "RULE" are used as tokens); a
new line-type "MA" will be used in these entries to store all the weight
matrices specific parameters. The format of the PROFILE.DOC file will
not be changed.
4. WE NEED YOUR HELP !
We welcome feedback from our users. We would especially appreciate that
you notify us if you find that sequences belonging to your field of
expertise are missing from the data bank. We also would like to be
notified about annotations to be updated, if, for example, the function
of a protein has been clarified or if new post-translational information
has become available.
<PAGE>
APPENDIX A: SOME STATISTICS
A.1 Amino acid composition
A.1.1 Composition in percent for the complete data bank
Ala (A) 7.66 Gln (Q) 4.02 Leu (L) 9.20 Ser (S) 7.09
Arg (R) 5.23 Glu (E) 6.26 Lys (K) 5.81 Thr (T) 5.83
Asn (N) 4.45 Gly (G) 7.04 Met (M) 2.35 Trp (W) 1.30
Asp (D) 5.27 His (H) 2.25 Phe (F) 3.99 Tyr (Y) 3.22
Cys (C) 1.78 Ile (I) 5.55 Pro (P) 5.03 Val (V) 6.52
Asx (B) 0.005 Glx (Z) 0.005 Xaa (X) 0.02
A.1.2 Classification of the amino acids by their frequency
Leu, Ala, Ser, Gly, Val, Glu, Thr, Lys, Ile, Asp, Arg, Pro, Asn, Gln,
Phe, Tyr, Met, His, Cys, Trp
A.2 Repartition of the sequences by their organism of origin
Total number of species represented in this release of SWISS-PROT: 4143
A.2.1 Table of the frequency of occurrence of species
Species represented 1x: 1879
2x: 685
3x: 387
4x: 247
5x: 186
6x: 143
7x: 88
8x: 78
9x: 76
10x: 48
11- 20x: 151
21- 50x: 103
51-100x: 33
>100x: 39
<PAGE>
A.2.2 Table of the most represented species
Number Frequency Species
1 2530 Human
2 2376 Escherichia coli
5 1563 Baker's yeast (Saccharomyces cerevisiae)
4 1496 Mouse
5 1385 Rat
6 654 Bovine
7 579 Fruit fly (Drosophila melanogaster)
8 495 Bacillus subtilis
9 489 Chicken
10 369 African clawed frog (Xenopus laevis)
11 341 Salmonella typhimurium
341 Rabbit
13 311 Pig
14 251 Vaccinia virus (strain Copenhagen)
15 224 Maize
16 200 Bacteriophage T4
17 193 Human cytomegalovirus (strain AD169)
18 190 Arabidopsis thaliana (Mouse-ear cress)
19 183 Vaccinia virus (strain WR)
20 180 Rice
21 166 Pseudomonas aeruginosa
166 Tobacco
23 164 Pea
24 162 Wheat
25 155 Caenorhabditis elegans
155 Fission yeast (Schizosaccharomyces pombe)
27 138 Barley
28 133 Soybean
29 131 Slime mold (Dictyostelium discoideum)
30 130 Spinach
31 129 Staphylococcus aureus
32 127 Sheep
33 119 Marchantia polymorpha (Liverwort)
34 118 Rhodobacter capsulatus
35 117 Dog
36 114 Pseudomonas putida
37 111 Neurospora crassa
38 110 Klebsiella pneumoniae
39 104 Bacillus stearothermophilus
<PAGE>
A.3 Repartition of the sequences by size
From To Number From To Number
1- 50 1966 1001-1100 321
51- 100 3345 1101-1200 199
101- 150 4723 1201-1300 156
151- 200 3191 1301-1400 97
201- 250 2804 1401-1500 86
251- 300 2469 1501-1600 44
301- 350 2289 1601-1700 46
351- 400 2377 1701-1800 37
401- 450 1780 1801-1900 43
451- 500 1933 1901-2000 31
501- 550 1325 2001-2100 13
551- 600 922 2101-2200 40
601- 650 645 2201-2300 48
651- 700 488 2301-2400 16
701- 750 475 2401-2500 18
751- 800 368 >2500 100
801- 850 270
851- 900 293
901- 950 182
951-1000 189
Currently the ten largest sequences are:
RYNR_RABIT 5037 a.a.
RYNR_HUMAN 5032 a.a.
APB_HUMAN 4563 a.a.
APOA_HUMAN 4548 a.a.
RRPA_CVMJH 4488 a.a.
DYHC_TRIGR 4466 a.a.
GRSB_BACBR 4451 a.a.
PLEC_RAT 4140 a.a.
POLG_BVDV 3988 a.a.
VGF1_IBVB 3951 a.a.
<PAGE>
APPENDIX B: ON-LINE EXPERTS
B.1 List of on-line experts for PROSITE and SWISS-PROT
Field of expertise Name Email address
--------------------------- ------------------ ----------------------------
Alcohol dehydrogenases Joernvall H. hans.jornvall at k1m.ki.se
Persson B. bengt.persson at embl-
heidelberg.de
Aldehyde dehydrogenases Joernvall H. hans.jornvall at k1m.ki.se
Persson B. bengt.persson at embl-
heidelberg.de
Alpha-crystallins/HSP-20 Leunissen J.A.M. jackl at caos.caos.kun.nl
de Jong W. u629000 at hnykun11.bitnet
Alpha-2-macroglobulins Van Leuven F. fred at blekul13.bitnet
AA-tRNA synthetases class II Leberman R. leberman at frembl51.bitnet
Apolipoproteins Boguski M.S. boguski at ncbi.nlm.nih.gov
AraC family HTH proteins Ramos J.L. jlramos at cnbvx3.cnb.uam.es
Arrestins Kolakowski L.F.Jr. kolakowski at helix.mgh.
harvard.edu
Asparaginase / glutaminase Gribskov M. gribskov at sdsc.edu
ATP synthase c subunit Recipon H. recipon at ncbi.nlm.nih.gov
Band 4.1 family proteins Rees J. jrees at vax.oxford.ac.uk
Beta-lactamases Brannigan J. jab5 at vaxa.york.ac.uk
Beta-transducin family Boguski M.S. boguski at ncbi.nlm.nih.gov
C-type lectin domain Drickamer K. drick at cuhhca.hhmi.columbia.
edu
Chalcone/stilbene synthases Schroeder J. raf at sun1.ruf.uni-freiburg.de
Chaperonins cpn10/cpn60 Georgopoulos C. georgopo at cmu.unige.ch
Chaperonins TCP1 family Willison K.R. willison at icr.ac.uk
Chitinases Henrissat B. bernie at cermav.grenet.fr
Clusterin Peitsch M.C. peitsch at ulbio1.unil.ch
Cold shock domain Landsman D. landsman at ncbi.nlm.nih.gov
CTF/NF-I Mermod N. nmermod at ulys.unil.ch
Gronostajski R. gronosr at ccsmtp.ccf.org
Cytochromes P450 Holsztynska E.J. ela at netcom.uucp
netcom!ela at apple.com
DEAD-box helicases Linder P. linder at urz.unibas.ch
Deoxyribonuclease I Peitsch M.C. peitsch at ulbio1.unil.ch
dnaJ family Kelley W. kelley at cmu.unige.ch
EF-hand calcium-binding Cox J.A. cox at sc2a.unige.ch
Kretsinger R.H. rhk5i at virginia.bitnet
Elongation factor 1 Amons R. wmbamons at rulgl.leidenuniv.nl
Enoyl-CoA hydratase Hofmann K.O. khofmann at biomed.biolan.uni-
koeln.de
Fatty acid desaturases Piffanelli P. piffanelli at jii.afrc.ac.uk
fruR/lacI family HTH proteins Reizer J. jreizer at ucsd.edu
GATA-type zinc-fingers Boguski M.S. boguski at ncbi.nlm.nih.gov
GDT/GTP dissociation stimul. Boguski M.S. boguski at ncbi.nlm.nih.gov
<PAGE>
GltP family of transporters Hofmann K.O. khofmann at biomed.biolan.uni-
koeln.de
Glucanases Henrissat B. bernie at cermav.grenet.fr
Beguin P. phycel at pasteur.bitnet
Glutamine synthetase Tateno Y. ytateno at genes.nig.ac.jp
G-protein coupled receptors Chollet A. arc3029 at ggr.co.uk
Attwood T.K. bph6tka at biovax.leeds.ac.uk
Kolakowski L.F.Jr. kolakowski at helix.mgh.
harvard.edu
GTPase-activating proteins Boguski M.S. boguski at ncbi.nlm.nih.gov
HIT family Seraphin B. seraphin at embl-heidelberg.de
HMG1/2 and HMG-14/17 Landsman D. landsman at ncbi.nlm.nih.gov
Inorganic pyrophosphatases Kolakowski L.F.Jr. kolakowski at helix.mgh.
harvard.edu
Integrases Roy P.H. 2020000 at saphir.ulaval.ca
Kringle domain Ikeo K. kikeo at genes.nig.ac.jp
Lipocalins Boguski M.S. boguski at ncbi.nlm.nih.gov
Peitsch M.C. peitsch at ulbio1.unil.ch
lysR family HTH proteins Henikoff S. henikoff at sparky.fhcrc.org
MAC components / perforin Peitsch M.C. peitsch at ulbio1.unil.ch
Malic enzymes Glynias M. mglynias at ncsa.uiuc.edu
MAM domain Bork P. bork at embl-heidelberg.de
MIP family proteins Reizer J. jreizer at ucsd.edu
Myelin proteolipid protein Hofmann K.O. khofmann at biomed.biolan.uni-
koeln.de
Pancreatic trypsin inhibitor Ikeo K. kikeo at genes.nig.ac.jp
PEP requiring enzymes Reizer J. jreizer at ucsd.edu
pfkB carbohydrate kinases Reizer J. jreizer at ucsd.edu
Phytochromes Partis M.D. partis at gcri.afrc.ac.uk
Plant viruses icosahedral Koonin E.V. koonin at ncbi.nlm.nih.gov
capsid proteins
Protein kinases Hanks S. hanks at vuctrvax.bitnet
Hunter T. hunter at salk-sc2.sdsc.edu
PTS proteins Reizer J. jreizer at ucsd.edu
Restriction-modification Bickle T. bickle at urz.unibas.ch
enzymes Roberts R.J. roberts at neb.com
Ribosomal protein S15 Ellis S.R. srelli01 at ulkyvm.bitnet
Ring-cleavage dioxygenases Harayama S. sharayam at ddbj.nig.ac.jp
Signal sequence peptidases von Heijne G. gvh at csb.ki.se
Dalbey R.E. rdalbey at magnus.acs.ohio-
state.edu
Sodium symporters Reizer J. jreizer at ucsd.edu
Subtilases Brannigan J. jab5 at vaxa.york.ac.uk
Siezen R.J. nizo at caos.caos.kun.nl
Thiol proteases Turk B. turk at ijs.ac.mail.yu
Thiol proteases inhibitors Turk B. turk at ijs.ac.mail.yu
TNF family Jongeneel C.V. vjongene at isrecmail.unil.ch
TPR repeats Boguski M.S. boguski at ncbi.nlm.nih.gov
Transit peptides von Heijne G. gvh at csb.ki.se
Type-II membrane antigens Levy S. levy at cellbio.stanford.edu
Uracil-DNA glycosylase Aasland R. aasland at embl-heidelberg.de
<PAGE>
Vitamin K-depend. Gla domain Price P.A. pprice at ucsd.edu
XPGC protein Clarkson S.G. clarkson at cmu.unige.ch
Xylose isomerase Jenkins J. jenkins at frira.afrc.ac.uk
WAP-type domain Claverie J.-M. jmc at ncbi.nlm.nih.gov
ZP domain Bork P. bork at embl-heidelberg.de
African swine fever virus Yanez R.J. ryanez at cbm2.uam.es
Bacteriophage P4 Halling C. chh9 at midway.uchicago.edu
Caenorhabditis elegans Sonnhammer E. esr at mrc-molecular.biology.
cam.ac.uk
Chloroplast encoded proteins Hallick R.B. hallick at arizona.edu
Drosophila Ashburner M. ma11 at phx.cam.ac.uk
Escherichia coli Rudd K. rudd at ncbi.nlm.nih.gov
Salmonella typhimurium Rudd K. rudd at ncbi.nlm.nih.gov
Snakes Stocklin R. stocklin at cmu.unige.ch
B.2 Requirements to fulfill to become an on-line expert
An expert should be a scientist working with specific famili(es) of
proteins (or specific domains) and who would:
a) Review the protein sequences in SWISS-PROT and the patterns/matrices
in PROSITE relevant to their field of research.
b) Agree to be contacted by people that have obtained new sequence(s)
which seem to belong to "their" familie(s) of proteins.
c) Have access to electronic mail and be willing to use it to send and
receive data.
If you are willing to be part of this scheme please contact Amos Bairoch
at the following electronic mail address:
bairoch at cmu.unige.ch
<PAGE>
APPENDIX C: RELATIONSHIPS BETWEEN BIOMOLECULAR DATABASES
The current status of the relationships (cross-references) between some
biomolecular databases is shown in the following schematic:
**********************
*********************** * EPD [Euk. Promot.] *
* EMBL Nucleotide * <----> **********************
* Sequence Data *
****************** * Library * **********************
* FLYBASE * <--> *********************** <----- * ECD [E. coli map] *
* [Drosophila * ^ ^ **********************
* genomic d.b.] * <------+ | |
****************** | | +--------- **********************
| | * TFD [Trans. fact.] *
| | +--------> **********************
| | |
****************** v v v **********************
* REBASE * *********************** * ENZYME [Nomencl.] *
* [Restriction * <--- * SWISS-PROT * <----- **********************
* enzymes] * * Protein Sequence * |
****************** * Data Bank * v
*********************** **********************
****************** ^ ^ | | ^ ^ | * OMIM [Diseases] *
* EcoGene/EcoSeq * | | | | | | +--------> **********************
* [E. coli] * <-----+ | | | | |
****************** | | | | +-----------> **********************
| | | | * ECO2DBASE [2D] *
| | | | **********************
****************** | | | |
* PROSITE * <--------+ | | +---------------> **********************
* [Patterns] * | | * SWISS-2DPAGE [2D] *
****************** | +---------------+ **********************
| v |
| *********************** | **********************
+--------> * PDB [3D structures] * +--> * Aarhus/Ghent [2D] *
*********************** **********************
<PAGE>