IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

Genbank to GCG Help!

richk at mbcg.uchc.edu richk at mbcg.uchc.edu
Fri Apr 1 11:20:35 EST 1994


In Article <0097C37B.32C6EB80 at vms.csd.mu.edu>
6566friedman at vms.csd.mu.edu writes:
>
>I am looking for some technical assistance in converting the genbank 
>database available from the NLM archive into a VMS GCG usable format.  
>The LZ decompression file from ncbi.nlm.nih.gov bombed.  I understand 
>that LVDCMP.EXE from net.bio.net works better.  However our systems 
>analyst doesn't know what to do with the many files that output when we 
>run "Genbank to GCG".  Any suggestions???
>
Hi,

I've gotten it pretty well automated, the only real necessity is a LOT of
disk space. After downloading the individual databases via FTP in BINARY
mode, I run the decompression and reformatting programs in a batch queue.
To get the files I use the standard GCG supplied GETFILE.COM and GENFTP.COM,
but with a command file I execute once:

(NCBI.COM)
$ @getfile sys$batch dua2:[gcg] "ncbi-genbank" "gbbct.seq.Z" -
 gbbct.seq_lz binary ncbi.nlm.nih.gov
$ @getfile sys$batch dua2:[gcg] "ncbi-genbank" "gbest.seq.Z" -
 gbest.seq_lz binary ncbi.nlm.nih.gov
.. (you get the idea
$ @getfile sys$batch dua2:[gcg] "ncbi-genbank" "gbpri.seq.Z" -
 gbpri.seq_lz binary ncbi.nlm.nih.gov
(END of NCBI.COM)

The next thing I do is a bunch of command files to do the actual work:
(GB_EST.COM - example)
$ gcg
$ gcgsupport
$ lzd:==$sys$utils:[lzutils]lzdecompress
$ set def mbcg$dua3:[gcg] ! wherever you want the work to go on
$ def genbankdir 'f$env("default")  !point the fake genbankdir to the above dir
$ lzd gbest.seq_lz gb_est.seq   ! do the decompression
$ del gbest.seq_lz; ! delete the compressed original
$ genbanktogcg gb_est.seq /rel=81.0/year=94/MONTH=02/dir=genbankdir:-
/sn=gb_est/ln=gb_est   ! reformat the flat file
$ seqcat genbankdir:gb_est.seq/default  ! generate the catalog files
$ purge genbankdir:gb_est.* ! get rid of the flat file
(END of gb_Est.com)

The above command file will have to be replicated for each of the
genbank databases you download, and you can even get fancy and replace the
release number, year, month, etc with some vms symbols and counters. The
version of LZDecompress I use is about 5 years old or so and works great.
It should still be on ftp.spc.edu in the [macro32.savesets] directory or
thereabouts. If you'ld like all the command files, as well as the lz programs,
let me know and I can mail them to you.

The one thing to note is that I download,decompress, reformat all in a 
seperate directory, and then when I make sure noone is using the databases
in the "real" genbank directory (genbankdir:), I move the new files in.
The whole process takes about 1 day on a ssslllooowww vax 8200.
------------------------------------------------------------------------------
      Richard Kelley                                      phone: (203)679-4896
      University of Connecticut Health Center
      263 Farmington Ave.
      Farmington, CT. 06030-5205                  E-mail : RICHK at MBCG.UCHC.EDU
------------------------------------------------------------------------------



More information about the Info-gcg mailing list

Send comments to us at biosci-help [At] net.bio.net