=============================================================================
***********************************************
* GePSAN *
* Geneva Protein Sequence Analysis Newsletter *
***********************************************
Published by: Amos Bairoch
Dept. Medical Biochemistry / University of Geneva.
Switzerland
Volume 1, Number 1 / January 1991
To subscribe (or unsubscribe) to this newsletter: gepsan at cgecmu51.bitnet
To send comments/suggestions/criticisms: bairoch at cgecmu51.bitnet
Data bases availability summary
+------------+-------+------------------------------------------------------+
| Data base | Rel. | Email FTP FTP Tape CD-ROM |
| | | EMBL File Server GenBank NCBI |
+------------+-------+------------------------------------------------------+
| SWISS-PROT | 16.0 | Yes (by entry) Soon Yes Yes Yes |
| ENZYME | 3.0 | Yes Yes Yes Yes Yes |
| PROSITE | 6.0 | Yes Yes Yes Yes Yes |
| SEQANALREF | 13.5 | Yes Yes Yes No No |
+------------+-------+------------------------------------------------------+
SWISS-PROT/PROSITE/ENZYME tapes or CD-ROM subscription: datalib at embl.bitnet
EMBL file server email address: netserv at embl.bitnet
GenBank On-line Service FTP address: genbank.bio.net (or 134.172.1.160)
NCBI FTP address: ncbi.nlm.nih.gov (or 130.14.20.1)
=============================================================================
=============================================================================
TABLE OF CONTENTS
Volume 1, Number 1 / January 1991
1. What is GePSAN.
2. SWISS-PROT news.
3. Cross-references to OMIM in SWISS-PROT and PROSITE.
4. Biomolecular databases integration: current status.
5. NCBI, the GenInfo Backbone Database, and the ASN.1 syntax.
6. PROSITE news.
7. Updated list of public domain programs which make use of PROSITE.
8. ENZYME news.
9. Specialized databases part 1: the P450 database.
=============================================================================
<PAGE>
=============================================================================
Section: 1
Title : What is GePSAN.
GePSAN is a newsletter that deals with aspects of protein sequence analysis
that are relevant to the data bases that are maintained at the Department of
Medical Biochemistry (DMB) of the University of Geneva, namely:
SWISS-PROT: An annotated protein sequence data base. A joint project of
the DMB and of the EMBL Data Library.
PROSITE : A dictionary of sites and patterns in proteins.
ENZYME : An enzyme nomenclature data base.
SEQANALREF: A sequence analysis bibliographic reference data base.
This newsletter will also attempt to report new developments in the field of
protein sequence analysis.
=============================================================================
=============================================================================
Section: 2
Title : SWISS-PROT news.
1) Release 16
=============
Release 16.0 of SWISS-PROT contains 18364 sequence entries, comprising
5'986'949 amino acids abstracted from 17763 references. This represents
an increase of 9% over release 15. More than 1400 sequences have been
added since release 15, the sequence data of 271 existing entries has
been updated and the annotations of 3500 entries have been revised. In
particular we have used reviews articles to update the annotations of
the following groups or families of proteins:
- Alpha and beta adrenergic receptors
- Arrestins
- Chromogranins / secretogranins
- CTF/NF-I family
- ClpP proteases
- ets family
- GABA(A) receptors
- Gram-positive cocci surface proteins
- Hexokinases
- Integrins alpha and beta chains
- NMePhe pili proteins
- p53 proteins
- Poly(ADP-ribose) polymerase
- Profilins
- S-Adenosylmethionine synthetases
- Site-specific recombinases
- Synaptobrevins
- Type-II membrane antigens
- UDP-glucoronosyl transferases
- Uteroglobin family
- LBP / BPI / CETP family
We have finished adding cross-references to human protein sequence entries
which are represented in the latest edition of OMIM (see the next section
for full details).
<PAGE>
2) Future developments
======================
One question many users of SWISS-PROT ask me is: what is the exact extent of
the overlap between SWISS-PROT and PIR ? Up to now cross-references (DR
lines) were provided only to entries in the annotated section of PIR (which
is now called PIR1) and for which we provide a complete overlap. Only a few
cross-references were provided to entries in the unannotated sections of PIR
(which used to be called "NEW", but are now known as PIR2 and PIR3). We
started in release 16 to add cross-references, this task will continue in
release 17 and be completed for release 18. At that point it will be
possible to users that do not want to scan two protein data banks to
automatically extract from PIR2/PIR3 all the sequences that are not present
in SWISS-PROT and to produce a file that complement SWISS-PROT. In a next
issue of this newsletter we will explain this process in detail and also
describe what exactly are the differences between SWISS-PROT and PIR.
In release 18 we will invert the order of the information in the OS line.
Currently we have 'English common name (Latin name)`, we will switch to
'Latin name (English common name)`. Example:
OS HUMAN (HOMO SAPIENS).
will be changed to:
OS HOMO SAPIENS (HUMAN).
We hope to also provide in release 18 cross-references to TFD (the relational
database of transcription factors from David Gosh (NCBI / USA).
3) News concerning SWISS-PROT availability
==========================================
a) New SWISS-PROT entries and updates to existing entries are now available
in between regular releases from the EMBL File Server. They are not
provided on a daily basis like new nucleotide entries, but we intend to
make at least two or three sets of incremental updates between each
release.
b) SWISS-PROT is now available for download by FTP from the NCBI server.
All the files are in the \repository\SWISS-PROT directory.
c) SWISS-PROT will also soon be available, also by FTP, from the GenBank
On-line Service (GOS) server
=============================================================================
<PAGE>
=============================================================================
Section: 3
Title : Cross-references to OMIM in SWISS-PROT and PROSITE.
OMIM is the on-line version of Mendelian Inheritance in Man (MIM), the famous
book from Victor McKusick [1] which holds clinical data on a range of human
genetic diseases as well as all known gene loci. During the last five months
we have implemented cross-references to OMIM both in SWISS-PROT and ENZYME.
[1] McKusick Victor A.
Mendelian Inheritance in Man
Catalogs of autosomal dominant, autosomal recessive, and X-linked
phenotypes
Ninth edition
Johns Hopkins University Press, Baltimore, (1990).
Practically what has been done in SWISS-PROT is the following:
1) In each human protein entry whose gene was found to be described in
OMIM, a DR (cross-reference) line was added that points to the OMIM six
digits catalog number.
Example:
DR MIM; 261600; NINTH EDITION.
Currently (in release 16.0 of SWISS-PROT) there are 840 human protein
sequence entries with one or more DR lines that points to OMIM.
A new document file, called MIMTOSP.TXT, is provided with SWISS-PROT, it
is a sorted list of the MIM catalog entries cross-referenced in SWISS-
PROT and the corresponding protein sequence entry names.
2) If the protein is associated with a genetic defect or disease, this has
been indicated in the CC lines using the "DISEASE" topic.
Examples:
CC -!- DISEASE: THIS ENZYME IS DEFICIENT IN TWO GENETIC DISEASES: THE
CC LESCH-NYHAN SYNDROME, IN WHICH THERE IS NO ENZYME ACTIVITY; AND
CC HYPERURICEMIA WITH AN EARLY ONSET OF GOUT, IN WHICH THERE IS
CC PARTIAL ENZYME ACTIVITY.
CC -!- DISEASE: DEFICIENCY OF THE ENZYME CAUSES PHENYLKETONURIA (PKU),
CC THE MOST COMMON INBORN ERROR OF AMINO ACID METABOLISM.
3) If variants of the sequences are known, they have been indicated in the
feature table using the "VARIANT" key.
Example:
FT VARIANT 103 103 S -> R (GOUT MUNICH).
On the following page is an example of SWISS-PROT sequence which contains all
three types of MIM-related enhancements described above.
<PAGE>
ID CAH2$HUMAN STANDARD; PRT; 259 AA.
AC P00918;
DT 21-JUL-1986 (REL. 01, CREATED)
DT 21-JUL-1986 (REL. 01, LAST SEQUENCE UPDATE)
DT 01-NOV-1990 (REL. 16, LAST ANNOTATION UPDATE)
DE CARBONIC ANHYDRASE II (EC 4.2.1.1) (CARBONATE DEHYDRATASE II) (GENE
DE NAME: CA2).
OS HUMAN (HOMO SAPIENS).
OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; TETRAPODA; MAMMALIA;
OC EUTHERIA; PRIMATES.
RN [1] (SEQUENCE FROM N.A.)
RA MONTGOMERY J.C., VENTA P.J., TASHIAN R.E., HEWETT-EMMETT D.;
RL NUCLEIC ACIDS RES. 15:4687-4687(1987).
RN [2] (SEQUENCE FROM N.A.)
RA MURAKAMI H., MARELICH G.P., GRUBB J.H., KYLE J.W., SLY W.S.;
RL GENOMICS 1:159-166(1987).
RN [3] (SEQUENCE)
RA HENDERSON L.E., HENRIKSSON D., NYMAN P.O.;
RL J. BIOL. CHEM. 251:5457-5463(1976).
RN [4] (SEQUENCE)
RA LIN K.-T.D., DEUTSCH H.F.;
RL J. BIOL. CHEM. 249:2329-2337(1974).
RN [5] (SEQUENCE OF 1-76 FROM N.A.)
RA VENTA P.J., MONTGOMERY J.C., HEWETT-EMMETT D., TASHIAN R.E.;
RL BIOCHIM. BIOPHYS. ACTA 826:195-201(1985).
RN [6] (X-RAY CRYSTALLOGRAPHY, 2.0 ANGSTROMS)
RA LILJAS A., KANNAN K.K., BERGSTEN P.-C., WAARA I., FRIDBORG K.,
RA STRANDBERG B., CARLBOM U., JARUP L., LOVGREN S., PETEF M.;
RL NATURE NEW BIOL. 235:131-137(1972).
RN [7] (X-RAY CRYSTALLOGRAPHY, 2.0 ANGSTROMS)
RA ERIKSSON A.E., JONES T.A., LILJAS A.;
RL PROTEINS 4:274-282(1988).
RN [8] (X-RAY CRYSTALLOGRAPHY, 2.0 ANGSTROMS)
RA ERIKSSON A.E., KYLSTEN P.M., JONES T.A., LILJAS A.;
RL PROTEINS 4:283-293(1988).
RN [9] (JOGJAKARTA VARIANT)
RA JONES G.L., SOFRO A.S.M., SHAW D.C.;
RL BIOCHEM. GENET. 20:979-1000(1982).
RN [10] (MELBOURNE VARIANT)
RA JONES G.L., SHAW D.C.;
RL HUM. GENET. 63:392-399(1983).
CC -!- CATALYTIC ACTIVITY: H(2)CO(3) = CO(2) + H(2)O (REVERSIBLE
CC HYDRATATION OF CARBON MONOXIDE).
CC -!- THERE ARE AT LEAST 6 ENZYMATIC FORMS OF CARBONIC ANHYDRASE: CA-I
CC (OR B), CA-II (OR C), CA-III (OR M), CA-IV, CA-V AND CA-VI.
CC -!- DISEASE: DEFECTS IN CA2 ARE THE CAUSE OF OSTEOPETROSIS WITH RENAL
CC TUBULAR ACIDOSIS (MARBLE BRAIN DISEASE).
DR EMBL; Y00339; HSCA2.
DR EMBL; X03251; HSCAII.
DR EMBL; J03037; HSCAIIA.
DR PIR; A01141; CRHU2.
DR PIR; A23202; A23202.
DR PIR; A27175; A27175.
DR PDB; 1CA2; 15-JAN-90.
DR PDB; 2CA2; 15-APR-90.
DR PDB; 3CA2; 15-APR-90.
DR MIM; 259730; NINTH EDITION.
KW LYASE; ACETYLATION; ZINC; 3D-STRUCTURE.
<PAGE>
FT INIT_MET 0 0
FT MOD_RES 1 1 ACETYLATION.
FT ACT_SITE 63 63
FT ACT_SITE 66 66
FT METAL 93 93 ZINC, CATALYTIC.
FT METAL 95 95 ZINC, CATALYTIC.
FT METAL 118 118 ZINC, CATALYTIC.
FT ACT_SITE 126 126
FT ACT_SITE 196 198
FT VARIANT 17 17 K -> E (JOGJAKARTA).
FT VARIANT 235 235 P -> H (MELBOURNE).
FT VARIANT 251 251 N -> D.
SQ SEQUENCE 259 AA; 29115 MW; 365693 CN;
SHHWGYGKHN GPEHWHKDFP IAKGERQSPV DIDTHTAKYD PSLKPLSVSY DQATSLRILN
NGHAFNVEFD DSQDKAVLKG GPLDGTYRLI QFHFHWGSLD GQGSEHTVDK KKYAAELHLV
HWNTKYGDFG KAVQQPDGLA VLGIFLKVGS AKPGLQKVVD VLDSIKTKGK SADFTNFDPR
GLLPESLDYW TYPGSLTTPP LLECVTWIVL KEPISVSSEQ VLKFRKLNFN GEGEPEELMV
DNWRPAQPLK NRQIKASFK
//
In ENZYME we have added a "DI" (DIsease) line for all enzymes which are known
to be associated with a genetic defect. As shown in the following example:
DI PHENYLKETONURIA; MIM:261600.
Here is an example of an ENZYME entry with a DI line:
ID 4.2.1.1
DE CARBONIC DEHYDRATASE.
AN CARBONIC ANHYDRASE.
CA H(2)CO(3) = CO(2) + H(2)O.
CF ZINC.
DI OSTEOPETROSIS-RENAL TUBULAR ACIDOSIS SYNDROME; MIM:259730.
DR P00917, CAH1$HORSE; P00915, CAH1$HUMAN; P00916, CAH1$MACMU;
DR P13634, CAH1$MOUSE; P07452, CAH1$RABIT; P00921, CAH2$BOVIN;
DR P07630, CAH2$CHICK; P00918, CAH2$HUMAN; P00920, CAH2$MOUSE;
DR P00919, CAH2$RABIT; P00922, CAH2$SHEEP; P07450, CAH3$HORSE;
DR P07451, CAH3$HUMAN; P16015, CAH3$MOUSE; P14141, CAH3$RAT ;
DR P18915, CAH6$BOVIN; P18761, CAH6$MOUSE; P08060, CAH6$SHEEP;
DR P17067, CAHC$PEA ; P16016, CAHC$SPIOL;
//
=============================================================================
<PAGE>
=============================================================================
Section: 4
Title : Biomolecular databases integration: current status.
In the last six months there has been a number of developments relative to
the integration of biomolecular databases:
1) The EMBL Nucleotide Sequence Database is now fully cross-referenced to
SWISS-PROT.
2) SWISS-PROT and ENZYME are now cross-referenced to MIM (see section 3 of
this letter).
3) Cross-references have been added in SWISS-PROT to REBASE, the type II
restriction enzymes data base.
4) The new release (9012) of the Drosophila Genetic Maps (DMAP) database
from Michael Ashburner (Cambridge / U.K.) is now cross-referenced to
EMBL/GenBank, SWISS-PROT and PIR.
5) The new release (2.0) of the Transcription Factors Database (TFD) from
David Gosh (NCBI / USA) is now cross-referenced to EMBL/GenBank, SWISS-
PROT and PIR.
The current status of the relationships between the biomolecular databases is
shown in the following schematic:
*********************
*********************** <----- * EPD [Promoters] *
* EMBL Nucleotide * *********************
***************** * Sequence Data *
* DMAP * ----> * Library * *********************
* [Drosophila * *********************** <----- * ECD [E.coli] *
* Genetic maps] * ^ | ^ *********************
***************** ------- + | | |
| | | | *********************
Version: Jan. 10 | | | +---------- * TFD [Trans.fact.] *
1991 | | | | *********************
| | | |
***************** v | v v *********************
* PROSITE * <---- *********************** <----- * ENZYME [Nomencl.] *
* [Patterns] * ----> * SWISS-PROT * *********************
***************** * Protein Sequence * |
* Data Bank * |
***************** *********************** v
* REBASE * | | | *********************
* [Restriction * <-------+ | +---------> * OMIM [Diseases] *
* enzymes] * | *********************
***************** v
***********************
* PDB [3D structures] *
***********************
We believe that it is know possible to software developers to start to build
hypertext oriented software packages that can navigate between the different
biomolecular data banks.
=============================================================================
<PAGE>
=============================================================================
Section: 5.
Title : NCBI, the GenInfo Backbone Database, and the ASN.1 syntax.
The National Center for Biotechnology Information (NCBI), at the National
Library of Medicine (NLM) (Washington D.C) is involved in the development of
a database building system that addresses the problems of integrated
information as well as currency and accessibility. One of their projects is
the production of an integrated nucleic acid and protein sequence database,
which is called the GenInfo Backbone Database ('Backbone'), that accurately
reflects the journal literature. The Backbone will include all protein
sequences of at least three amino acids and nucleotide sequences of at least
nine bases. The annotations provided by the Backbone are minimal; it is meant
to reflect the data presented by the scientific literature; but not to
model biological reality. The Backbone is a database which will, hopefully,
help to build and maintain, fully annotated databases, such as SWISS-PROT,
PROSITE or ENZYME.
As the Backbone is a database on which to build other databases, the NCBI had
to select a reliable data exchange standard to facilitate the exchange of
information between biomolecular databases. The standard which has been
chosen is called ASN.1 (Abstract Syntax Notation 1), also known as ISO 8824.
ASN.1 is specifically designed to allow a formal precise definition of what
is exchanged between two applications without specifying how it is to be
represented or used by either application.
The NCBI is also committed in developing and distributing a software toolbox
that will help software and database developers to interact with the ASN.1
notation and the Backbone.
As a user of biomolecular database you will probably not have to deal
directly with the Backbone or with the ASN.1 format, except if you want to
develop a new specialized biomolecular database, but you should be aware of
the existence of such projects and of the many positive consequences for the
scientific community of such an endeavor, if it is successful. As we believe
in the scientific validity and relevance of these projects we have decided to
participate. Our participation will at least take two forms: we will provide
SWISS-PROT, ENZYME, and PROSITE in the ASN.1 syntax (the existing format will
not be discontinued) and we will start to use the Backbone as a source of
primary (literature) data for SWISS-PROT.
As a first step we have produced an ASN.1 specification for the ENZYME data
bank and will soon start to distribute an ASN.1 version of that database (see
section 8 of this newsletter).
=============================================================================
<PAGE>
=============================================================================
Section: 6
Title : PROSITE news.
1) Release 6.0
==============
Release 6.0 of PROSITE contains 375 documentation chapters that describe 433
different patterns. Since release 5.1 77 new chapters have been added and 131
have been updated. Release 6.0 is fully cross-referenced with release 16 of
SWISS-PROT.
There have been no changes in the format of the files of the data base.
2) Future developments
======================
- Release 6.10 will come out in March 1991 with release 17 of SWISS-PROT,
like it was the case for release 5.10, it will not be a "real" update, it
will only update pointers to SWISS-PROT for sequence entries whose name
have been modified from release 16 to 17.
- Release 7.0 will come out with release 18 of SWISS-PROT in early summer
1991. There will be lots of new pattern entries. We can already announce
the following ones (as they are either ready or being written):
- 6-phosphogluconate dehydrogenase signature
- Catalase signatures
- Peroxidases signature
- Acyltransferases ChoActase / COT / CPT-II family signatures
- Chalcone synthase and resveratrol synthase signature
- Glutamine amidotransferases class-I active site
- Glutamine amidotransferases class-II active site
- Polyprenyl synthetases signature
- Eukaryotic RNA polymerases 30 to 40 Kd subunits signature
- Prokaryotic carbohydrate kinases signature
- DNA polymerase family A signature
- Clostridium cellulases repeated domain signature
- ATP synthase a subunit signature
- Aconitase signature
- Guanylate cyclases signature
- FKBP peptidyl-prolyl cis-trans isomerase signatures
- Sodium symporters signatures
- Natriuretic peptides receptors signature
- PF4/IL-8 cytokines signatures
- Myotoxins signature
- Pathogenesis-related proteins BetvI family signature
We have a large (and growing) lists of new patterns to add. Some of those
that are currently in the `pipeline' are listed below.
<PAGE>
- SH2 and SH3 domains
- Animal lectin domain
- Bacterial sensory transduction proteins signatures
- Alpha-macroglobulin family signature
- Clusterins signature
- Plants 2S seed storage proteins signature
- TNF/NGF receptors family signature
- Small heat shock proteins (HSP20).
But this is far from being a complete list !
We have not yet received any matrices from any sources so the introduction
of matrices in PROSITE is probably not for release 7.0.
3) On-line experts
==================
We have added, in the PROSITE documentation file (PROSITE.DOC), the email
addresses of experts specific to a specific field.This information is present
in the following format:
-Expert(s) to contact by email: Name X.Y.
name at location.network
As you can see from the following table our current list of experts is still
very small, so I would like again to call for volunteers (the `requirements'
to be fulfilled to become an on-line expert are listed at the end of this
section), please don't be shy !!!
Field of expertise Name Email address
--------------------------- ------------------ --------------------------
Alcohol dehydrogenases Bengt P. bengt at medfys.ki.se
Aldehyde dehydrogenases Bengt P. bengt at medfys.ki.se
Apolipoproteins Boguski M.S. boguski at ncbi.nlm.nih.gov
Arrestins Kolakowski L.F. Jr. lfk at athena.mit.edu
Bacteriophage P4 Halling C. chh9 at midway.uchicago.edu
Beta-lactamases Brannigan J. bafm1 at cluster.sussex.ac.uk
Chitinases Henrissat B. cermav at frgren81.bitnet
CTF/NF-I Mermod N. nmermod at clsuni51.bitnet
EF-hand calcium-binding Cox J.A. cox at cgeuge52.bitnet
Kretsinger R.H. rhk5i at virginia.bitnet
Glucanases Henrissat B. cermav at frgren81.bitnet
Beguin P. phycel at pasteur.bitnet
Eryf1-type zinc-fingers Boguski M.S. boguski at ncbi.nlm.nih.gov
G-protein coupled receptors Chollet A. chollet at clients.switch.ch
Inorganic pyrophosphatases Kolakowski L.F. Jr. lfk at athena.mit.edu
Integrases Roy P.H. 2020000 at lavalvx1.bitnet
Protein kinases Hanks S. hanks at vuctrvax
Restriction-modification Bickle T. bickle at urz.unibas.ch
Roberts R.J. roberts at cshl.org
Ring-cleavage dioxygenases Harayama S. harayama at cgecmu51.bitnet
Subtilisin family proteases Brannigan J. bafm1 at cluster.sussex.ac.uk
Thiol proteases Turks B. turk at ijs.ac.mail.yu
Thiol proteases inhibitors Turks B. turk at ijs.ac.mail.yu
TPR repeats Boguski M.S. boguski at ncbi.nlm.nih.gov
Transit peptides von Heijne G. gunnar at cbts.sunet.se
Type-II membrane antigens Levy S. levy at cellbio.stanford.edu
<PAGE>
Requirements to fulfill to become an on-line expert
===================================================
An expert should be a scientist working with specific famili(es) of proteins
(or specific domains) and which would:
a) Review the protein sequences in SWISS-PROT and the patterns/matrices
in PROSITE relevant to their field of research.
b) Agree to be contacted by people that have obtained new sequence(s)
which seem to belong to "their" familie(s) of proteins.
c) Have access to electronic mail and be willing to use it to send and
receive data.
If you are willing to be part of this scheme please contact me (but, please
by email exclusively !)
=============================================================================
<PAGE>
=============================================================================
Section: 7
Title : Updated list of public domain programs which make use of PROSITE.
I have been made aware of the development of the following public domain
software packages that make use of PROSITE.
1) MacPattern
=============
Apple MacIntosh application. Offers features like a pattern list for pattern
selection, direct access to documentation in PROSITE, pattern sets, pattern
entering by keyboard, etc. It can read SWISS-PROT, PIR, DNA Strider, DNAid,
Pearson and plain ASCII sequences. MacPattern can also use any other pattern
database adhering to the PROSITE syntax, even DNA patterns. No special hard-
or software is required.
Contact : Rainer Fuchs
fuchs at embl.bitnet
Version : 1.1
Available: On the EMBL File Server: MAC_SOFTWARE:MACPATTERN.HQX
2) Scrutineer
=============
SCRUTINEER is a sophisticated pattern searching and database analysis program
written by Peter Sibbald at EMBL. The program is written in Pascal and comes
complete with source, manual and on-line help. SCRUTINEER is described in the
following reference:
Sibbald P.R., Argos P.
Scrutineer: a computer program that flexibly seeks and describes motifs
and profiles in protein sequence databases."
CABIOS 6:279-288(1990).
SCRUTINEER works on VAXes, and apparently can be made to runs on UNIX systems.
The November 1990 version of SCRUTINEER add, among other enhancements, the
possibility of searching for all of the PROSITE patterns in one or more
protein sequences.
Contact : Peter Sibbald
sibbald at embl.bitnet
Version : Nov. 1990
Available: On the EMBL File Server: VAX_SOFTWARE:MACPATTERN.UAA
3) ProSearch
============
A software, written mostly in AWK, that runs under Unix and that will search
a protein sequence for all of the PROSITE patterns. Note: it will also run
under MS-DOS and VMS if you have access to a public domain or commercial
version of AWK on such systems.
Contact : Lee F. Kolakowski
lfk at athena.mit.edu
Version : 1.1
Available: On the EMBL File Server: UNIX_SOFTWARE:PROSEARCH.UUE
<PAGE>
4) CREGEX
=========
CREGEX creates, from the native PROSITE data bank, the file containing valid
AWK regular expressions that can then be used with the ProSearch program.
Contact : Jack Leunissen
jackl at caos.caos.kun.nl
Version : 1.1
Available: On the EMBL File Server: UNIX_SOFTWARE:CREGEX.C
5) PROINDEX
===========
VAX-Fortran program to create an index built from the information stored in
the DE lines of the PROSITE.DAT file.
Contact : Steve Clark
clark at utoroci.bitnet or clark at mshri.utoronto.ca
Available: On the EMBL File Server: VAX_SOFTWARE:PROINDEX.UUE
6) PROSITEC
===========
VAX-Pascal program to convert the PROSITE files into GCG FIND-format.
Contact : Kay Hofmann
akc01 at dk0rrzk1.bitnet
Version : 1.1
Available: On the EMBL File Server: VAX_SOFTWARE:PROSITEC.UUE
7) ProDoc
=========
VAX program for the GCG package to display documentation entries in the
PROSITE.DOC file, given a documentation entry number.
Contact : Anne Marie Quinn
quinn at salk.bitnet
Available: By anonymous ftp on: SALK-SC2.SDSC.EDU
8) BISANCE system
=================
A program to interrogate PROSITE is available on-line on the BISANCE system
of the French CITI2 biocomputing resource.
Contact: Phillipe Dessen
dessen at frciti51.bitnet
=============================================================================
<PAGE>
=============================================================================
Section: 8
Title : ENZYME news.
There are few things we want to point out about release 3.0 of ENZYME as well
as about future releases.
1) Completeness
===============
Currently the data bank contains full information about the recommended name,
alternative name(s), catalytic activity, cofactor(s) of ALL 3071 enzymes. The
ENZYME data bank can now be considered as fully operational.
2) The DI line
==============
As described in section 3 of this letter, a new line type 'DI` (= DIsease)
was implemented (starting with release 2.0) so as to add cross-references to
MIM (Mendelian Inheritance in Man).
The precise format of the DI line is:
DI DISEASE_NAME; MIM:NUMBER.
Where 'NUMBER' is the MIM catalog number of the disease (or phenotype).
Examples:
DI XANTHINURIA; MIM:278300.
DI PHENYLKETONURIA; MIM:261600.
3) Future releases
==================
Until new enzyme nomenclature data is published we only plan to update the
SWISS-PROT pointers at each release of the protein sequence data bank,
correct eventual errors, and complete the information concerning synonyms and
cofactors using the literature.
4) An ASN.1 version of ENZYME
=============================
We will soon start to distribute a version of ENZYME in the ASN.1 syntax
which has been selected by the NCBI to facilitate the exchange of information
between biomolecular databases (see section 5 of this newsletter).
<PAGE>
We will continue to distribute ENZYME in its current format, but there will
be two additional files:
ECSPEC.ASN: ENZYME database ASN.1 specification. This file describes the
syntax used by the ASN.1 version of the ENZYME data base.
ENZYME.ASN: ENZYME database in ASN.1 notation.
We will not list here the full ENZYME database ASN.1 specification, but just
to give you a "flavor" of ASN.1, an example of an entry in both the original
and the ASN.1 format:
ID 1.4.3.14
DE L-LYSINE OXIDASE.
AN LYSYL OXIDASE.
CA L-LYSINE + O(2) + H(2)O = 2-OXO-6-AMINOHEXANOATE + NH(3) + H(2)O(2).
CF COPPER; PQQ.
CC -!- ALSO ACTS, MORE SLOWLY, ON L-ORNITHINE, L-PHENYLALANINE, L-ARGININE,
CC AND L-HISTIDINE.
DI CUTIS LAXA (EHLERS-DANLOS SYNDROME IX); MIM:304150.
DI LYSINE INTOLERANCE; MIM:247900.
DR P16636, LYOX$RAT ;
//
Is represented in the ASN.1 notation, following the specifications that we
have developed for it, by:
Enzyme-activity ::= {
ecnumb {
class 1 ,
subclass 4 ,
sub-subclass 3 ,
serial-numb 14 } ,
status data
{
name "L-LYSINE OXIDASE." ,
synonyms { "LYSYL OXIDASE." } ,
reaction reac-equa {
left { { stoich "1" , compound { chem-name "L-LYSINE" } } ,
{ stoich "1" , compound { chem-name "O(2)" } } ,
{ stoich "1" , compound { chem-name "H(2)O" } } } ,
right { { stoich "1" , compound { chem-name "2-OXO-6-AMINOHEXANOATE" } } ,
{ stoich "1" , compound { chem-name "NH(3)" } } ,
{ stoich "1" , compound { chem-name "H(2)O(2)" } } } } ,
cofactors { { chem-name "COPPER" } ,
{ chem-name "PQQ" } } ,
comments { "ALSO ACTS, MORE SLOWLY, ON L-ORNITHINE, L-PHENYLALANINE,
L-ARGININE, AND L-HISTIDINE." } ,
disease { { disease-name "CUTIS LAXA (EHLERS-DANLOS SYNDROME IX)",
MIM-numb 30415 } ,
{ disease-name "LYSINE INTOLERANCE", MIM-numb 24790 } } ,
x-ref { { db-name "SPROT", ident-1 "P16636", ident-2 "LYOX$RAT" } }
}
}
=============================================================================
<PAGE>
=============================================================================
Section: 9
Title : Specialized databases: the P450 database
We will use this section to describe specialized biomolecular databases
which, in our opinion, are important, yet not very well known. In this first
issue we briefly describe:
********************************
* The cytochrome P450 database *
********************************
Produced by the group of Alexander Archakov at the Institute of Biological
and medical Chemistry of the USSR Academy of Medical Sciences in Moscow, this
database contains a wealth of information on cytochromes P450: names,
sequences, genome location, inducers, substrates, etc. The database
supplements the book of A.I. Archakov and G.I. Bachmanova: "Cytochrome P-450
and active oxygen", published by Taylor and Francis Ltd in 1990.
The database is distributed, for MS/PC-DOS based systems, in two forms: the
first one, called DBCPD, runs under dBase III plus, the second one, called
RBCPD, runs under Rbase. Both forms are menu-driven and are very easy to use.
The group of Archakov can be contacted at the following address:
Prof. A.I. Archakov
Institute of Biological and Medical Chemistry
USSR Academy of Medical Sciences
Pogodinskaya str. 10
119838 Moscow
USSR
Fax: (+7) (095) 938 21 23
(+7) (095) 245 08 57
=============================================================================
====== End of GePSAN Newsletter Volume 1 - Number 1 =========================