IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

North American access to EMBL 30 and update flat files

J. Michael Cherry cherry at frodo.mgh.harvard.edu
Sun May 3 15:21:00 EST 1992



Announcing North American ftp access to the EMBL sequence database and
updates. 

The latest EMBL database quarterly release (version 30) and updates,
obtained from bioftp.unibas.ch every Sunday morning Boston time, are
available via anonymous ftp from amber.mgh.harvard.edu (IP
132.183.190.26).  All files are compressed text files in the EMBL flat
file format. 

These files are made available via the assistance of Reinhard Doelz
(doelz at urz.unibas.ch) at EMBnet Switzerland in Basel, and the EMBL in
Heidelberg, Germany. 

Many sites that keep the full GenBank release online have found it
useful to process the EMBL database looking for sequences that are
unique to EMBL.  There are hundreds of sequences in EMBL that are not in
GenBank.  The opposite case is also true. 

I am providing access to the EMBL flat files because we have a good
Internet connection thus minimizing the trans-atlantic transfers and
allowing faster transfers to sites within North America and to the West. 
I plan to provide these files as long as their is a demand and will
obtain the EMBL quarterly releases as soon as available. 

See the 000readme.txt file in the embl subdirectory on amber for
checksums of the quarterly release data files and other general
information about the update files.  The there are two update files: one
contains all new sequences (XEMBL.FLAT_Z) and the other contains updated
entries for sequences present in the latest quarterly release
(XXEMBL.FLAT_Z). 

All files are compressed so remember to transfer in binary mode.  If you
are transferring to a VMS system amber's ftp server will do the VMS
STRUCTURE mode of transfer.  The compressed files end with "_Z" and may
need to be renamed to ".Z" for Unix uncompress.  The VMS compress
utilities are available from genbank.bio.net and others via anonymous
ftp, look for LZDCMP. 

If there is sufficient interest I will also make the processed EMBL
database available in GCG 7 format.  If you are interested please let me
know if you could use a VMS BACKUP Saveset.  The processed EMBL database
sections only contain sequences not in GenBank or in the current GenBank
update based on primary accession numbers. 

Mike Cherry
cherry at frodo.mgh.harvard.edu
Director of Computing, Molecular Biology
Massachusetts General Hospital, Boston




More information about the Bionews mailing list

Send comments to us at biosci-help [At] net.bio.net