In message <9310301907.AA04875 at net.bio.net> posted to Bio-Soft Jean-Loup Risler
wrote:
> Up to now I received the PIR database from MIPS on a magtape. Like many many
> others, I had troubles with the GCG files provided on the tape, which were
> fixed by re-running DBINDEX.
>> Now I receive PIR on their CD-ROM. Since I want to use GCG, I have to copy
> the files onto my VAX/VMS. *NOTE* this was a pain in the neck, a CD_MOUNT
> followed by a COPY doesn't work because the resulting file are "undefined".
> The CD_ACCESS program from P. Stockwell, available from the EMBL server,
> doesn't work either on these CD-ROMs (it's the only case I know of).
>> You MUST mount the CD-ROM with the following command:
>> CD_MOUNT/media=cdrom/UNDEFINED=(STREAM_LF:512)
>> It was rather hard to be aware of the UNDEFINED switch ... this may be
> useful to other people...
>> Well, finally it worked. The .REF and .SEQ files seem to be OK. Now, if I
> run PIRTOGCG or DBINDEX, I get the following message for *ALL* the
> sequences:
>> * no accession number for sequence XXXX *
>> Anyway the programs go on, I get the usual .NAMES, .OFFSET, etc...files
> whose size seem reasonable.
>> *BUT* I can't FETCH any sequence. If I try to fetch CCHU from PIR1, for
> example, I get * no files in PIR1:CCHU * .....
>> NOTE 1: I am lazy and I always wait for a certain time before installing the
> minor releases, waiting for other people to find the new bugs ... :-)
> Hence I'm still under GCG #7.0 Is this the reason?
>> NOTE 2: The files are NOT in CODATA format. They just look like the good
> old PIR files. However, the accession number is hidden in a line such as:
> c;Accession: xxxxxx
As we understand it you would like to use the GCG sequence analysis package
with the CD-ROM. Although the ATLAS package is designed to be a standalone
database query/retreival system, the data may be used in conjunction with other
applications. This summarizes the procedure we believe you are looking for.
Since the GCG index files do not exist on the CD you must off-load all data
files to disk and create the necessary auxiliary files. The CD_ACCESS program
should work, but after in-house testing we have determined that it does not.
Apparently the CD format is incompatible with our specifications; we are
discussing this issue with the CD publisher.
In order to copy files from the CD to disk one must use the CD_MOUNT command
with the UNDEFINED_FAT (for non-VMS users that is "undefined file access type"
:-) qualifier as you have discovered. Two types of files can be retrieved:
1) binary files (PIR1.INX, TERM.TDX)
If you want the BINARY files such as PIR1.INX, use the following CD_MOUNT
command with no further file modifications but substituting the correct
device name for your CD-ROM:
$ CD_MOUNT/MEDIA=CDROM/UNDEFINED_FAT=(FIXED:NONE:512) $1$dka100:
2) ASCII files (PIR1.SEQ, PIR1.REF)
The process of downloading ASCII files is a little more complicated.
The files MUST end up in their native format with the following file
attributes:
Record format: Variable length, maximum XXX bytes
Record attributes: Carriage return carriage control
where XXX is "490" for a .SEQ file and "500" for a .REF file depending on
the PIR release. After copying the files from CD, the DCL CONVERT command
must be used to create files with the proper attributes. The following is
an FDL file describing the resulting file after using the CONVERT utility.
Extract this file between the "----cut here----" marks (exclusive) and call
it PIRCD.FDL.
----------------------------------cut here-----------------------------------
IDENT "27-JAN-1993 13:30:42 VAX-11 FDL Editor"
RECORD
CARRIAGE_CONTROL carriage_return
FORMAT variable
----------------------------------cut here-----------------------------------
Protocol:
a) $ CD_MOUNT/MEDIA=CDROM/UNDEFINED_FAT=(STREAM:500) $1$dka100:
!! DO _NOT_ USE "STREAM_LF" !!
b) $ COPY $1$dka100:[DATA.NBR]PIR1.SEQ PIR1.SEQTMP
$ COPY $1$dka100:[DATA.NBR]PIR1.REF PIR1.REFTMP
...
c) $ CONVERT/FDL=PIRCD.FDL PIR1.SEQTMP PIR1.SEQ
$ CONVERT/FDL=PIRCD.FDL PIR1.REFTMP PIR1.REF
...
d) $ CD_DISMOUNT $1$dka100:
After this procedure use the DCL command DIRECTORY/FULL to make certain the
ASCII files have the proper file attributes.
Please use the above protocols and report any problems to us. We appreciate
notification of a problem with the ATLAS product. If you like, we will supply
you with a detailed DCL COMMAND procedure to facilitate file manipulation for
future releases.
------------------------------------------------------------------------
Christopher Marzec
MARZEC at NBRF.Georgetown.Edu
Dr. John S. Garavelli
Database Coordinator
Protein Information Resource
National Biomedical Research Foundation
Washington, DC 20007
POSTMAST at GUNBRF.BITNETPOSTMASTER at NBRF.GEORGETOWN.EDU