Tue Aug 16 16:15:19 EST 1994

		SAGITTARIUS PIR-41 (30 June 1994) variant

    SAGITTARIUS PIR is a highly compact databank variant of original PIR 
database designed to assist individual researchers and software developers 
in utilization of sequence database information without huge storage space
requests. It contains custom compressed PIR information and C-written interface 
which allow fast direct access to the stored information without total 
decompressing of corresponding files. Starting from this PIR-41 version, one 
and the same databank files as well as interface C-file can be used on both 
PC-compatibles and UNIX V computers (with forthcoming Mac interface version), 
without any modifications. Interface supports all standard PIR Request Network 
queries (i.e. get databank SEQ number by entry; for the defined databank SEQ 
number, get specified information like: name, organism(s), keyword(s), 
sequence, sequence features with coordinates etc.). In difference with PIR 
Request Network, SAGITTARIUS PIR allows you to call PIR-contained information 
directly from your C program, even on the personal computer separated from any 
network. In addition, all numerical information like introns placement and 
concrete feature coordinates are aquired by calling program in the form of 
constants instead of text strings what simplifies (sub)sequence manipulations. 
For even larger storage compactness and flexibility, SAGITTARIUS PIR was 
realized in the form of separate file sets, where each file set contains 
database information of independent type (i.e. sequences, entry indexes, 
organisms etc.). On the particular computer, available configuration of the 
PIR information could be easily changed as needed by the user itself without 
any damage for retrievals of other types of stored information. For example, a
file set which contains protein sequences itself and their PIR entry indexes 
(including reverse indexation arrays) takes less than 12Mb of disk space.
    For PC-compatibles (with fortcoming version for UNIXV/Xwindow), a dialog 
shell is available which supports all standard PIR Request Network queries 
plus homology searches, alignments etc.
    SAGITTARIUS PIR is distributed freely (all databank file sets, interface
C-file, test/example program C-file, PC-shell executable) via anonymous FTP.
File sets and interface can be used/included in any commercially distributed
package without any restrictions. Consultations and advanced interface 
variants (currently used to support fast effective database manipulations 
in other SAGITTARIUS family packages) are available from the developers upon 

  For now, SAGITTARIUS PIR compressed databank stores in custom compressed 
form following original informational types (fields) of PIR database:

	  - database entry index 
	  - accession number(s) 
	  - other (non-PIR) database crossreference(s)
	  - protein name 
	  - organism name(s) 
	  - alternative protein name(s)
	  - keyword(s)
	  - superfamily name(s)
	  - gene name(s)
	  - map position(s)
	  - unusual start codon(s)
	  - intron(s) placement
	  - literature reference(s), including for each:
	  	-journal or citation 
	        -free-format comment
	  - sequence feature(s)
	  - free-format comment
	  - protein sequence itself

  For PIR-41, all bank files takes 33+ Mb on hard disk (20+ Mb in 
ZIP-compressed form). Each original database informational field (i.e.
sequences, organisms, names, keywords etc.) is stored in separate file set 
what allows the user to configure reduced bank variants by simply excluding
unnecessary information files from unpacking. For example, deletion of 
literature references reduces the bank to only 23 Mb. Core (minimal 
configuration supported by available PC-shell) variant of databank files 
includes only indexes and sequences. All more complete configurations could 
be produced by simply adding (depacking from distributive) of corresponding 
file sets.

     List of distributive files with decompressed files description

ZipFile    ZipSize                Content Description
------------------------------------------------------\/  Core config part  \/
CORE    11,105,308                Entry indexes + sequences itself
------------------------------------------------------\/ User-variable part \/
NAME       991,927                Sequence names
ORGANISM   488,638                Organisms
KEYWORD    270,868                Keywords
S_FAMILY   156,485                Superfamily classifications
CROSSREF   284,114                Other database crossreferences
FEATURE    850,677                Sequence features
GENE_MAP    25,803                Genetic map positions
ALT_NAME   174,754                Sequence alternative names
GENE       163,812                Genes
CODON        7,543                Unusual start codons
ACC_CODE 1,394,839                PIR accession codes
COMMENT    256,560                Sequence comments
INTRON      30,991                Intron(s) placement
REF_JOU    897,568                References core : references itself
REF_AUTH 1,593,198                 Ref. extention : reference authors
REF_TITL 1,840,525                 Ref. extention : reference titles
REF_COMM    92,982                 Ref. extention : reference comments
------------------------------------------------------\/ Dialog shell for PC \/
PC_SHELL   146,967                PC-executable + two MAP-files (to \PIR)
------------------------------------------------------\/       Interface     \/
INTERFAC    15,383                Interface and test program, C-files, PRJ file

  SAGITTARIUS PIR Automated Sequence Bank is a dialog shell for manipulation 
of the compressed sequence database information with orientation on 
MS DOS/Windows PC-compartibles, with installed hard disk optimizers 
(like Smartdrive, Hyperdisk, Ncache etc.). 386 or 486 are recommended, 
86 and 286 will be significantly slower but still are OK. Recommended minimal 
size of memory allocation by disk optimizer is 512 Kb, but shell will be 
still usable (but significantly slower) even without any optimizer. 

The dialog data shell supports the following main operations:
   - selection of sequences to bank buffer by
        - dictionary-defined record for specified informational
          field (name, source, keyword, feature etc.)
        - user-defined context in specified informational
          field (name, source, keyword, feature etc.)
        - set of dictionary-defined records for different informational
          fields (source, keyword, superfamily etc.)
        - SEQ (non)perfect homology with user-defined short sequence
   - store and retrieve buffer content (SEQ bank numbers and indexes)
     between sessions
   - output user-specified (buffer) SEQ data to disk files
   - fast SEQ homology searches (for user-defined SEQ of length not more 
     than 50-100 positions, only 1 hour with full PIR bank on 486/33)
   - fast subregion-sensitive pairwaise alignments (user-defined
     sequence with buffer SEQ's or full bank)
   - easy data access from user programs (C) as a support for
     applications development 

    SAGITTARIUS data bank files are usually filled out by current available 
PIR database information only by distributors (2 to 4 times in a year). 
Distributive variant includes ready-for-use informational files, interface
and executables - all in compressed form.


  SAGITTARIUS PIR is available by anonymous FTP from:

     FTP.SCRI.FSU.EDU, directory /pub/genetics/pir/

  SAGITTARIUS PIR is also available by anonymous FTP from some 
of the well-known bio-servers (IUBIO etc.).


			Installation on UNIX V system

  All decompressed SAGITTARIUS databank files must be placed in one and
the same directory which name should be correctly specified in the interface
C-file (first strings). Placement of the 'X' symbol in the first position of 
the corresponding text string (instead of '/') will force interface to carry
out formal check for databank files presence.

			Installation on PC

  If you plan to use PC-shell, all decompressed SAGITTARIUS databank files 
must be placed in the directory \PIR on any (but one and the same) logical 
drive. Otherwise all decompressed SAGITTARIUS databank files must be placed 
in one and the same directory which name should be correctly specified in 
the interface C-file (first strings). Placement of the 'X' symbol in the first 
position of the corresponding text string (instead of some drive letter) will 
force interface to carry out 1) search of corresponding directory on all 
available logical drives and 2) formal check for databank files presence.

  Bank executable BANK.EXE may be placed in any directory on any logical 
drive. Important: .MAP-files contained in the same PC-shell compressed file 
should be moved into the directory \PIR where other databank files are placed.

  It is  highly reccomended to run BANK.EXE from directory (and/or logical 
drive) other than data location to avoid random bank files structure 
damage. Bank is oriented on file-server data accession and can find 
\PIR directory (and test them for correct data configuration) on 
any logical drive.



This package (with compressed data files) can be redistributed
freely without any limitations but only free of charge and for 
non-commercial usage. No changes in data files and/or executables 
are allowed.

You may include compressed SAGITTARIUS datafiles in your application 
packages freely even in the case of any commercial usage.


For HELPFUL comments and discussions please contact

	Dr. Victor B. Strelets (strelets at scri.fsu.edu)

	Computational Genetics and Biophysics,
	Supercomputer Computations Research Institute, 
	FSU B-186, Tallahassee, FL 32306-4052, USA


For control purposes you may use the following info about SAGITTARIUS PIR 
distributive files (with information about containing compressed files):

CORE     ZIP    11,105,308
	SEQ0     BAN       299,040
	SEQ      BAN     9,613,474
	DIC      BAN       330,888
	DIC2     BAN       106,496
	IND0     BAN       299,040
	IND2     BAN       301,056
	IND      BAN       598,080
	0IND     REV       299,040
	1IND     REV       299,040

NAME     ZIP       991,927
	NAM0     BAN       299,040
	NAM2     BAN       180,224
	NAM      BAN       929,160
	0NAM     REV       179,196
	1NAM     REV       299,040

ORGANISM ZIP       488,638
	SOU0     BAN       299,040
	SOU1     BAN        78,468
	SOU2     BAN        24,576
	SOU      BAN       105,816
	0SOU     REV        23,972
	1SOU     REV       461,716

KEYWORD  ZIP       270,868
	KW0      BAN       298,936
	KW1      BAN        98,916
	KW2      BAN         6,144
	KW       BAN        20,536
	0KW      REV         5,560
	1KW      REV       270,868

GENE     ZIP       163,812
	GENE0    BAN       286,272
	GENE1    BAN        37,176
	GENE2    BAN        30,720
	GENE     BAN        75,168
	0GENE    REV        29,748
	1GENE    REV        50,200

ALT_NAME ZIP       174,754
	ANAM0    BAN       298,892
	ANAM1    BAN        35,376
	ANAM2    BAN        28,672
	ANAM     BAN       116,104
	0ANAM    REV        28,560
	1ANAM    REV        48,004

S_FAMILY ZIP       156,485
	SFAM0    BAN       293,424
	SFAM1    BAN        26,116
	SFAM2    BAN        14,336
	SFAM     BAN        58,840
	0SFAM    REV        13,520
	1SFAM    REV       123,060

ACC_CODE ZIP     1,394,839
	AC0      BAN       299,040
	AC1      BAN       334,516
	AC2      BAN       333,824
	AC       BAN       665,760
	0AC      REV       332,880
	1AC      REV       332,880

CODON    ZIP         7,543
	CDN      BAN           192
	CDN0     BAN       216,476
	CDN1     BAN           148
	CDN2     BAN         2,048
	0CDN     REV            96
	1CDN     REV         3,352

FEATURE  ZIP       850,677
	FT0      BAN       298,672
	FT1      BAN       222,132
	FT2      BAN        47,104
	FT       BAN       259,144
	FTN0     BAN       298,672
	FTN1     BAN       222,132
	FTN      BAN       904,984
	0FT      REV        46,272
	1FT      REV       214,004

GENE_MAP ZIP        25,803
	MAP0     BAN       277,236
	MAP      BAN        29,480

COMMENT  ZIP       256,560
	CC0      BAN       283,604
	CC       BAN       468,496

CROSSREF ZIP       284,114
	CR0      BAN       299,040
	CR       BAN       362,176

INTRON   ZIP        30,991
	INTR0    BAN       207,764
	INTR1    BAN        39,084

REF_JOU  ZIP       897,568
	REF0     BAN       299,040
	REF1     BAN       403,020
	REF      BAN     1,109,376

REF_AUTH ZIP     1,593,198
	AUT0     BAN       299,040
	AUT1     BAN       403,020
	AUT      BAN     2,419,248

REF_TITL ZIP     1,840,525
	TITLE0   BAN       299,040
	TITLE1   BAN       365,264
	TITLE    BAN     2,978,768

REF_COMM ZIP        92,982
	REFCOM0  BAN       299,020
	REFCOM1  BAN        49,324
	REFCOM   BAN       105,032

PC_SHELL ZIP       146,967
	BANK	 EXE	   468,640
	BANK1    MAP       151,200
	BANK2    MAP       151,200

Standard disclaimer:
Author(s) will in no way be held liable for any loss of profit or 
any other commercial damage including but not limited to special,  
incidental, consequential or other damages from use of this 
package. You may use them only with the understanding that 
you use it at your own risk  and that your use of the software 
and datafiles is your agreement to this disclaimer. 

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net