IUBio

Release 21 of TREMBL, a protein sequence database supplementing SWISS-PROT

Maria Jesus Martin martin at ebi.ac.uk
Mon Jun 24 04:00:07 EST 2002


INTRODUCTION
============

TrEMBL is a computer-annotated protein sequence database
supplementing the SWISS-PROT Protein Knowledgebase. TrEMBL
contains the translations of all coding sequences (CDS)
present in the EMBL/GenBank/DDBJ Nucleotide Sequence
Databases and also protein sequences extracted from the
literature or submitted to SWISS-PROT, which are not yet
integrated into SWISS-PROT. TrEMBL can be considered as a
preliminary section of SWISS-PROT. For all TrEMBL entries
which should finally be upgraded to the standard SWISS-PROT
quality, SWISS-PROT accession numbers have been assigned.

RELEASE 21.0 OF TrEMBL
=====================

This TrEMBL release was created from the EMBL Nucleotide
Sequence Database release 70 and updates until 07.05.02
and contains 751'148 entries and 218'504'701 amino acids.
To minimize redundancy, the translations of all coding
sequences (CDS) in the EMBL Nucleotide Sequence Database
already included in SWISS-PROT release 40 and updates
until 21.06.02 have been removed from TrEMBL release 21.

TrEMBL is split in two main sections: SP-TrEMBL and
REM-TrEMBL: SP-TrEMBL (SWISS-PROT TrEMBL) contains the
entries (671'580) which should be eventually incorporated
into SWISS-PROT. SWISS-PROT accession numbers have
been assigned for all SP-TrEMBL entries.

SP-TrEMBL is organized in subsections:

arc.dat (Archaea):                        1644 entries
arp.dat (Complete Archaeal proteomes):   29757 entries
fun.dat (Fungi):                         14606 entries
hum.dat (Human):                         29766 entries
inv.dat (Invertebrates):                 68301 entries
mam.dat (Other Mammals):                 10511 entries
mhc.dat (MHC proteins):                   8069 entries
org.dat (Organelles):                    58906 entries
phg.dat (Bacteriophages):                 5676 entries
pln.dat (Plants):                        67339 entries
pro.dat (Prokaryotes):                   66680 entries
prp.dat (Complete Prokaryotic Proteomes):123685 entries
rod.dat (Rodents):                       27467 entries
unc.dat (Unclassified):                    143 entries
vrl.dat (Viruses):                       71522 entries
vrt.dat (Other Vertebrates):             12279 entries
vrv.dat (Retroviruses):                  75229 entries

56'042 new entries have been integrated in SP-TrEMBL.
The sequences of 810 SP-TrEMBL entries have been updated
and the annotation has been updated in 189'983 entries.

In the document deleteac.txt, you will find a list of all
accession numbers which were previously present in TrEMBL,
but which have now been deleted from the database.

REM-TrEMBL (REMaining TrEMBL) contains the entries (79'568)
that we do not want to include in SWISS-PROT.

ACCESS/DATA DISTRIBUTION
========================

FTP server:     ftp.ebi.ac.uk/pub/databases/trembl
SRS server:     http://srs.ebi.ac.uk/

TrEMBL is also available on the SWISS-PROT CD-ROM.
SWISS-PROT + TrEMBL is searchable on the following
servers at the EBI:

FASTA3  (http://www.ebi.ac.uk/fasta33/)
BLAST2  (http://www.ebi.ac.uk/blast2/)
Bic_sw  (http://www.ebi.ac.uk/bic_sw/)
Scanps  (http://www.ebi.ac.uk/scanps/)
MPSrch  (http://www.ebi.ac.uk/MPsrch/)

For each TrEMBL release, a synchronized version of the
concurrent SWISS-PROT release is distributed at
ftp.ebi.ac.uk/pub/databases/trembl/swissprot/

We also produce every week a complete non-redundant
protein sequence collection by providing three
compressed files (these are in the directory
/pub/databases/sp_tr_nrdb on the EBI FTP server and
in databases/sp_tr_nrdb on the ExPASy server):
sprot.dat.gz, trembl.dat.gz and trembl_new.dat.gz.

TrEMBL HAS BEEN PREPARED BY:
============================

Maria Jesus Martin, Claire O'Donovan, Allyson Williams,
Daniel Barrell, Philippe Aldebert, Rolf Apweiler,
Kirsty Bates, Paul Browne, Sergio Contrino, Kirill
Degtyarenko, Gill Fraser, Henning Hermjakob, Kati Laiho,
Alexander Kanapin, Youla Karavidopoulou, Paul Kersey,
Minna Lehvaslaiho, Michele Magrane, Virginie Mittard,
Nicola Mulder, John F. O'Rourke, Sandra Orchard,
Sandra van den Broek, Eleanor Whitfield and  at the
EMBL Outstation - European Bioinformatics Institute (EBI)
in Hinxton, UK;
Amos Bairoch, Alain Gateau, Alexandre Gattiker, Isabelle
Phan and Sandrine Pilbout at the Swiss Institute of
Bioinformatics in Geneva, Switzerland.


---------------------------------------------
Maria Jesus Martin                     email:martin at ebi.ac.uk
EMBL Outstation EBI
(European Bioinformatics Institute)    URL: http://www.ebi.ac.uk
Wellcome Trust Genome Campus           Tel: +44 (1223) 494408
Hinxton                                fax: +44 (1223) 494468
Cambridge
CB10 1SD UK
----------------------------------------------





More information about the Proteins mailing list

Send comments to us at biosci-help [At] net.bio.net