GCG supply a number of utilities for interconversion between formats and
GCGs own database format. For example, the command "genbanktogcg" converts
Genbank flat files to GCG databases.  You will usually need to also make
compressed blast indices and also SRS and/or lookup indices as well, if
you want to support these. All these are distributed by GCG preformatted;
hence the 4 Gb. To reformat manually, you need to have the flat files too,
at least temporarily. Hence the 8Gb - though if you do the reformat
section by section you can reduce the required space overhead.  However,
remember that what needs 8Gb now may well need 16 Gb next year. 

I reckon that it is worth investing a certain amount of time in automating
the process of obtaining and reformatting the database updates the way you
want them. Then the system just runs itself nightly or whatever without
intervention until major updates;  GCG provide some scripts for this sort
of thing, though we prefer to write our own so I can't comment on their
efficacy.  One thing (problem?)  is that we (like, I think many European
sites?) use EMBL rather than Genbank as the primary DNA database, so the
whole system has to be reconfigured anyway. 

You can get the Genbank flat files by FTP from ncbi.nlm.nih.gov . You
will also need access to EMBL, swissprot, prosite and more; the EBI
(ftp.ebi.ac.uk) is very convenient for us, but I guess there is a nearer
archive in the US.

