THE DEF DATA BASE AND MAIL SERVICE FOR
SEQUENCE BASED PROTEIN FOLD CLASS PREDICTIONS
Martin Reczko
Department of Molecular Biophysics, German Cancer Research Center,
Heidelberg, Germany and
Henrik Bohr
Center for Biological Sequence Analysis,
The Technical University of Denmark,
Lyngby, Denmark
The DEF (Database for Expected Fold-classes) and mail service generates
protein fold-class and protein domain predictions from sequences in the
SWISSPROT protein sequence data base or individual sequences. In the
DEF output a sequence of amino acids is assigned a specific overall
fold-class, a super fold-class with respect to secondary structure
content and spatial distribution and a profile of possible fold-classes
along the sequence. The definition of protein domains is derived from
this foldclass profile. The assignment of a fold-class is one out of 45
well-known folds derived from the 3-dimensional protein structures in
the Brookhaven Protein Data Bank, PDB. Most of these 45 fold-classes
are contained in the set "3d-ali" given by Pascarella and Argos,
Prot. Eng. 5:121-137 (1992). In this context folds are protein domains
with a distinct back-bone topology of their 3-dimensional structure.
Performance
The prediction of the 44 classes is correct in 77 % of 130 test cases
(a random prediction is 2.3 % correct). Sequences with 0 to 25 %
sequence identity to proteins of the training set are predicted
correctly in more than 70 % of the cases.
The 4 super classes are all-alpha, alpha*beta, alpha+beta, and all-
beta. The alpha*beta superclass stands here for alpha-helices and
beta-sheets intertwined while the alpha+beta class has alpha-helices
and beta-sheets separated in distinct domains. The prediction of the 4
superclasses is correct in 90.4 % of the test cases.
The predictions are generated by artificial neural networks as descibed in
Reczko, M. and Bohr, H., The DEF Data Base of Sequence Based Protein
Fold Class Predictions,Nucl. Ac. Res. 22,p. 3616-3619 (1994)
Reczko, M., Bohr. H., Sudhakar, P. V., Hatzigeorgiou, A.,
Subramaniam, S., Fold Class Prediction by Neural Networks, In:
Protein Structures by Distance Analysis, p. 277-286, Eds. Bohr, H. and
Brunak, S., IOS press, (1994)
Availiability:
The DEF mailserver for individual predictions:
An automatic mail server that can make fold-class predictions for any
sequence submitted. Just send a mail to
def at mbp-sgi4.inet.dkfz-heidelberg.de
containing your sequence in single letter code in the Subject
line or in the mail text with an empty Subject line.
Sequence lines longer than 120 residues must be seperated
by carrige returns, shorter lines are ok.
Anonymous ftp address:
mbp-sgi4.inet.dkfz-heidelberg.de or 193.174.48.50
cd /pub/databases/def
Currently HUMAN, ECOLI, YEAST, MOUSE, DROME (drosophila melanogaster),
CAEEL (Caenorhabditis elegans), BOVINE and RAT
proteins are avaliable.
*** Other proteins may be predicted using the DEF mailserver ***
Contact:
Martin Reczko, Molekular Biophysics (0810)
German Cancer Research Center, 69120 Heidelberg, Germany.
Telephone: +49-6221-422338, Telefax: +49-6221-422885
email: reczko at dkfz-heidelberg.de
--
__________________________________________________
Dept. of Molecular Biophysics (0810)
German Cancer Research Center
Im Neuenheimer Feld 280, 69120 Heidelberg, Germany
Tel: (49) 6221-422338, FAX: (49) 6221-422333
email: reczko at dkfz-heidelberg.de