IUBio

DB update; howto?

Peter Rice pmr at sanger.ac.uk
Fri Jun 30 03:24:37 EST 1995


In article <3su57h$j16 at news.rrz.uni-koeln.de> "Dr. Joerg Sprengel" <sprengel at scan.genetik.uni-koeln.de> writes:
>   what is the most elegant way to update newly arrived database in the GCG
>   package. i.e. how to keep the old (and running) configuration and in 
>   parallel the new dataset? Is there ONE top level symbol what could be 
>   used?

The simple way is to redefine the top level directory to point to a scratch
area (redefine the logical name EMBLDIR or GENBANKDIR) and then switch
when complete. That keeps the header files looking right. Switching means
either copying files into the "old" directory (recommended) or changing
the logical name in GCG (this means users have to start GCG again to use
the new databases).

Many sites, I suspect, don't bother - that method needs more disk space
(room for a spare release, and the original files) then the alternative
of pausing the batch queues and updating in place once the batch runs are
completed.

>   How do you generate the non-redundant dataset EMBL+GenBank? Should be possible
>   to take SRS for this task.

GCG provides this with ACCESSIONNUMBERS - which works both ways. But that
requires space to install a fulll EMBL and GENBANK release.

If you do intend to keep the full EMBL and GENBANK around, then yes you
can use SRS - just pick all GENBANK entries not refernced by EMBL (you
are in Germany so I assume you want EMBL as the main database :-) but you can
only do this after you have indexed the complete new database.

SRS is also very useful if you want a specific subset (by organism
for example) to index in GCG (use the original format) or for BLAST
(use FASTA output format).

I prefer to use just one database (EMBL of course :-) and add the new
entries to it to make "GenEmbl". You can pick up the new entries
by ftp weekly (or daily if you really need to) and put them easily
into GCG and SRS.
--
------------------------------------------------------------------------
Peter Rice                           | Informatics Division
E-mail: pmr at sanger.ac.uk             | The Sanger Centre
Tel: (44) 1223 494967                | Hinxton Hall, Hinxton,
Fax: (44) 1223 494919                | Cambs, CB10 1RQ
URL: http://www.sanger.ac.uk/~pmr/   | England



More information about the Info-gcg mailing list

Send comments to us at biosci-help [At] net.bio.net