Using the GenBank nightly updates with the IG-Suite

Sunil Maulik maulik at PRESTO.IG.COM
Thu May 9 02:01:40 EST 1991

In response to questions from our users about how to utilize the daily
database updates available on the GenBank On-line Service, I would
like to point out that the TOIGSF program in the IG-Suite solves this

This posting (long) explains how to create a data bank for use with
the IG-Suite, and then provides step-by-step instructions on updating
the data bank with the nightly GenBank update files.

Please feel free to address any questions or queries concerning this
posting to myself at the address below.


Sunil Maulik, Ph.D.
Customer Services
IntelliGenetics, Inc.
(415) 962-7342
FAX: (415) 962-7302

Technical E-mail: ig-consultant at presto.ig.com (Internet)
Personal  E-mail: maulik at presto.ig.com (Internet)
		  ames!ig.com!presto!maulik (Uucp)

---------------------------- cut here -----------------------------------------


You can add your own amino acid, nucleic acid, or key data bank to the
list of available data banks.  You may wish to create a data bank that
holds the sequences added to GenBank, EMBL, PIR, or SWISS-PROT between
regular releases.  The TOIGSF program is now available to convert the
raw data bank sequence files to IntelliGenetics format sequence files.
When you want to search all of the sequences in a data bank, you
should then select both the data bank of the current release and the
data bank with the new sequences that you created.  You only need to
create the data bank once, and you can continue to add new files to it
with TOIGSF.  The newly created data bank will appear on the list of
data banks available for searches in the QUEST, FASTDB, BIFIND, and
IFIND programs.

1.  Make an "igdatabanks.par" File

Each data bank must have an entry in an "igdatabanks.par" file in the
"/usr/igsw/igv54/runtimes" directory (Sun) or the IGRUNTIME: directory
(VAX).  (An example of such a file is in the Sun
"/usr/igsw/igv54/examples/igdatabanks.par" file or the VAX
"[IG]igdatabanks.vms" file.) Copy this file, rename it if neccesary,
and edit it.  The contents of this file must be in the following

Example: New-GenBank,ng,NUCLEIC ACID,SEQUENCE FILE,/pr0/joe/	(Sun)

<NAME> is the name of the data bank; this is the name displayed in the
list of available data banks (New-Proteins in the example) for the
data bank searching programs.

<ABB> is the one-, two-, or three-letter abbreviation of the data bank
name (ng in the example). The data bank consists of all of the entries
in all of the files in the directory you specified that have the ABB
file name extension (all ".ng" files in the "/pr0/joe/" directory
(Sun) or $DISK1:[JOE] directory (VAX) in this example).  The ABB
filename extension must be in lower case letters. 

<TYPE> is NUCLEIC ACID, AMINO ACID, or KEY, depending on what kind of
sequence is in the data bank (AMINO ACID in the example); all entries
in the data bank must be of the same type.  SEQUENCE FILE indicates
that the sequences are in IntelliGenetics Suite sequence file format.

<DIRECTORY> is the computer directory in which the sequence files or
key files are located ("/pr0/joe/" (Sun) or $DISK1:[JOE] (VAX) in this
example); you must set the permissions on this directory and its files
so that they are readable by the users of the data bank.  

2.  Make an ".IDB" File

Each data bank you make must also have an ".IDB" file, which must have
the name <ABB>.IDB (ng.IDB in this example); IDB must be in uppercase
letters. (An example of such a file is in the Sun
"/usr/igsw/igv54/examples/SEQ.IDB" file or the VAX "[IG]SEQ.IDB"
file.) Copy this file and edit it or make this file with a text
editor. The file must be in the same directory as the sequence files
for the data bank to be recognized (in this example, the file is
"/pr0/joe/ng.IDB" (Sun) or $DISK1:[JOE]ng.IDB (VAX) ).  Entries in the
".IDB" file have the format:

TYPE <type>
TOTAL <##>

HEADER INFORMATION is one or more lines of free format information; it
may give the data bank name, the release number, the date, etc.  The
line after the HEADER INFORMATION consists of 65 hyphens.  <#> is the
release number that you assign to this data bank; it will appear after
the data bank name in the list of data banks.  <phrase> is the short
description of the data bank; it will appear after the data bank name
and release number in the list of data banks. The short description
should have no more than 50 characters.  <type> is AMINO ACID, NUCLEIC
ACID, or KEY.  <##> is the number of sequences in the data bank.  For
example, the np.IDB file in our example is:

                     New-GenBank Data Bank
                             Release 1
                             April 1991
DESCRIPTION New GenBank Sequences

3.  Check Your Work

When you have created a proper "igdatabanks.par" file, a proper ".IDB"
file, and IntelliGenetics format sequence or key files with the proper
file name extension, the next time you run a data bank searching
program, the data bank will appear in the list of available data
banks. In this example, the data bank list would be:

File		- User-specified or indirect file
A-GENESEQ 2	- Patented amino acid sequences data bank 
New-GenBank 1	- New GenBank Sequences
GenBank 67	- GenBank nucleic acid sequences data bank
PIR 23		- Protein Identification Resource sequence data bank 
SWISS-PROT 17 	- University of Geneva protein sequence data bank


1. Login as the root or system user:

	su to root (Suns)
	login as SYSTEM (VAX)

2. Make sure you are in the directory where you want the data files
containing the nightly updates to reside:

	cd /pr0/joe (Suns)

3. Connect to the GenBank Online Service over the Internet using the
FTP program:

	ftp genbank.bio.net (
	login: anonymous
	password: your-last-name

4. Change directories to the directory containing the nightly update data:

	cd /pub/db/gb-newdata

5. Get the README file and determine if the new sequences replace the
previous gbupdate or if they must be appended to the previous file.
(They will replace it only if you have just obtained a new GenBank
update from IntelliGenetics)

5. Set the FTP file transfer mode:

	binary mode transfer

6. Use the GET command to obtain the file containing the cumulative
nightly update:

	get gbseq.all.Z

7. Once the file has been completely transferred, terminate the FTP connection:

	bye (or exit)

8. Uncompress the file containing the updates:

	zcat gbseq.all.Z > gbseq.all
	 on VAX systems, use the appropriate VMS decompression routine

9. Remove the original compressed file:

	 rm gbseq.all.Z (Sun) 
	 DEL gbseq.all.Z.* (VAX)

10. Run the TOIGSF program and convert the uncompressed file to
IntelliGenetics sequence file format:

		input file gbseq.all
		input format genbank nucleic acids
		run the conversion

11. TOIGSF will make one file (or more, if there are over 100 sequence
entries) with the name xxx_toigsfn.seq. Once these files have been
created, delete the original uncompressed file:

	rm gbseq.all (Sun)
	DEL gbseq.all;* (VAX)

12. If the sequences are to replace the previous update then:

	rm gbupdate.ng
	cat *.seq > gbupdate.ng (Sun)

	DEL gbupdate.ng;*
	APPEND *.seq gbupdate.ng (VAX)

13. If the sequences are to be added to the previous update then:
	cat *.seq >> gbupdate.gbu (Sun)
	APPEND *.seq gbupdate.ng (VAX)
14. Remove the .seq files created by TOIGSF:

	rm *.seq (Sun)
	DEL *.seq;* (VAX)

15. Edit NG.IDB to reflect the new update date AND the appropriate
number of sequences.  If the update was replaced, use the number of
sequences reported by TOIGSF.  If the update was appended then add the
old number to the number TOIGSF reports.

16. Alert users to the new data by editing /etc/motd (Sun) or
SY$LOGIN.COM (VAX) to reflect the new update.

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net