IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

EMBL to NBRF conversion

Jack Leunissen jackl at caos.kun.nl
Thu Nov 18 04:00:37 EST 1993

Dear collegues,

A new version of the program EMBL2NBRF is available from the CAOS/CAMM
anonymous FTP-site (host: camms1.caos.kun.nl; dir: pub/molbio/embl2nbrf).

EMBL2NBRF reformats EMBL or Swiss-Prot flatfiles into NBRF-styled files,
i.e. SEQ, REF, and (optionally) TTL files. These files can be used with
the GCG package (only run dbindex on them), or with the NBRF-programs XQS,
PSQ, and NAQ.
The program compiles and runs on various flavors of UNIX and under VMS.
It only requires the use of an ANSI-C compiler. If your system is not
equipped with ANSI-C, you can use the GCC compiler.

The major new feature of the program is its ability to read data from
standard input (besides the already existing options to read flatfiles
directly, or via a file-list). This makes it no longer necessary to copy,
uncompress, or concatenate files before they can be processed. See below
for some examples.


To reformat the new compressed Swiss-Prot file, store the data in NBRF-files
'swissprot.seq' and 'swissprot.ref' (-n flag), store the summary-report in
'swissprot.info' (-s flag), and monitor the progress (-m flag), type:

% zcat sprot27.dat.Z | embl2nbrf -n swissprot -s swissprot.info -m --

To reformat the primates-section on the EMBL CDROM, do

% tr -d '\015' < /cdrom/EMBL/PRI.DAT | embl2nbrf -n em_pr -s em_pr.info --

(the translate command 'tr' strips all <CR>'s from the flatfile on CDROM!)

The same, but now storing ALL EMBL flatfiles, AND the EMBL-updates in their
distribution directory (-d flag), using a file-list (-f flag), into one
NBRF-style database, called 'embl', converting all sequences to uppercase:

% cat /cdrom/EMBL/*DAT | tr -d '\015' | embl2nbrf -u -m \
	-- -d /data/embnet/dna/data -f updates.lis 


Program Version: 1.9 (17-Nov-1993)
Syntax: embl2nbrf [-flags] [files] [ [-flags] [files] ]
        --      Read from standard input.
        -a      Append data to existing NBRF-formatted files.
        -d DIR  Read files (in a file-list) from directory DIR.
        -f FOF  Read filenames from file-list FOF.
        -m      Monitor mode on (Report every 1000 entries processed).
        -n NAME Specify filename for output (default is "embl").
        -s SUMM Save summary-report to file SUMM.
        -t      Create title-line file (TTL).
        -u      Convert sequences to uppercase. Default is to ignore case.
        -v      Verbose mode on (Report every entry processed).

      Jack A.M. Leunissen, Ph.D. | CAOS/CAMM Center
      Email: jackl at caos.kun.nl   | University of Nijmegen
      Tel. : +31 80 65 22 48     | Toernooiveld 1
      Fax  : +31 80 65 29 77     | 6525 ED Nijmegen, The Netherlands
    +-------- CAOS/CAMM is the Dutch National Node in EMBnet --------+

More information about the Info-gcg mailing list

Send comments to us at biosci-help [At] net.bio.net