I am currently working on providing access to the sequence databases
and search/comparison software for some Russian scientists. This
is necessary because there is no FTP access from Russia to the US
[courtesy of our paranoid government :-|] and problems associated
with the reliability/cost of E-mail within Russia. The only machines
usually available in Russia are PCs making all of the VAXen/Sun/SGI
software irrelevent unless ported to S5 UNIX or DOS.
I am planning to use the updates from GB to produce an up-to-date copy
of the DB under System 5 on a 386 and then send tapes/disks to Russia
every few months. Eventually if the E-mail/FTP access is resolved
the use of the software to update existing DB could be transfered
directly to Russia.
I have the GBUPDATE package by Brent Hobbs but this seems to create
a single file containing the entire database which is bad for S5 UNIX
(due to ulimits and multiple indirect blocks) and worse for DOS
(file system size limits). There seem to be messy tradeoffs between
organizing this in flat files (perhaps in directories for each organism)
which would require extra time to open each file verses large files
which would require the creation of some fancy indexing scheme.
Is anyone keeping the database in some kind of flat file system
that they can give me some advice on what works/doesn't work?
If necessary, I could use a database like Oracle to manage this.
Are the descriptions of tables/views/indexes used by the db managers
at the various sites (GB/NCBI/EMBL) available? What about the
data loading software? Has anyone tried to replicate/update the
database using the updates with a DBMS or is everyone using GCG on VMS?
And lastly, is there a complete "english" description of the information
in the database entries?
Robert Bradbury uunet!sftwks!bradbury
Death is an imposition on the human race, and no longer acceptable
Alan Harrington, The Immortalist (1969)