Subj="SynCron" tools for maintaining synchronised copies of the EMBL
Nucleotide Sequence Database
Looking at the statistics for access to the EMBL Nucleotide Sequence Database
update files on the EBI ftp site (ftp://ftp.ebi.ac.uk/pub/databases/embl/new/)
we observe that many people download the full cumulative data file
(cumulative.dat) rather than the daily update files. In an attempt to make
daily update files more useful and to provide a reliable mechanism for
re-generating the cumulative.dat file locally from daily updates, the EBI and
the Swiss EMBNet node have jointly developed a set of tools which can be used
to fetch the daily updates and update the local database.
The programs make use of 'transaction listings' made available on the EBI ftp
site. These transaction listings are now supplied with every update file and
include a record of each update, insert and delete operation to the EMBL
Nucleotide Sequence Database as represented in the flat-file updates. The
naming scheme for transaction listings is the same as for daily, weekly, and
cumulative updates with the extension ".lis". The transaction listings are
found in:
ftp://ftp.ebi.ac.uk/pub/databases/embl/new/list/
and look like:
Acc# ID Action DateStamp Ver# Division
T58328 AA328 U 19951108232958 3 EST
T58329 AA329 C 19951108233007 3 EST
T58330 AA330 D 19951108233015 3 EST
R67977 AA977 U 19951108230600 3 EST
R67978 AA978 U 19951108230611 3 EST
where U=Update C=Create D=Delete
Using the tools it is possible to regenerate the cumulative.dat file (at a
remote site) reliably from daily updates. Validation of the new
cumulative.dat file is also possible using the transaction listing provided at
the EBI.
Using these programs it should be possible to keep a copy of the EMBL
Nucleotide Sequence Database that exactly matches the contents of the database
in operation at the EBI for external services with manual intervention
required only in the event of some failure in network transfer of the file -
etc.
These programs are available by anonymous ftp from (the _002 version number
will change as the programs are updated):
UNIX Version:
ftp://ftp.ebi.ac.uk/pub/software/unix/listtools/SynCron_002.tar.gz
VMS Version:
(backup/gzip)
ftp://ftp.ebi.ac.uk/pub/software/vms/listtools/SynCron_002.bck-gz
OR
(tar/compress)
ftp://ftp.ebi.ac.uk/pub/software/vms/listtools/SynCron_002.tar_Z
Matteo diTommaso
Database Programming Group
EMBL Outstation
The European Bioinformatics Institute
E-mail: ditommaso at ebi.ac.uk
Nicole Redaschi and Reinhard Doelz
Biozentrum - University of Basel
EMBnet Node Switzerland
E-Mail: embnet at comp.bioz.unibas.ch