Database file format conversion

Robert Jones jones at Think.COM
Mon Sep 30 08:00:16 EST 1991

   From: wrp at biochsn.acc.Virginia.EDU (William R. Pearson)
   Newsgroups: bionet.software
   Organization: University of Virginia
	   One of the things I would very much like to see would be a
   general set of of programs for indexing/extracting sequences from any
   standard database format. I find it very frustrating to have to
   reformat databases or keep multiple copies, and have taught FASTA about
   most popular formats.  Unfortunately, although FASTA can read most
   libraries, users are often frustrated because the software to extract
   sequences is unavailable.  I would happy to donate the code that we
   use to index PIR/VMS format and Genbank flat file format (two separate
   sets of programs) to get the ball rolling.
   Bill Pearson
In the computer graphics field there is a very simple and efficient way of 
handling conversions between the many image formats. The Portable Bitmap
Toolkit (PBM)provides a single 'central' image format along with a set of small
unix programs, used as filters, to take TIFF files, for example, into the
central format and then convert that to, say, an X11 bitmap. Various people
have contributed filters and the collection is impressive. All code is, I
believe, public domain - or at least freely available. To perform a
conversion you simply pipe the output of one filter into the input of

I was thinking about setting up an equivalent for sequence files and
database extraction could readily fall into this. For someone familiar with
UNIX it would be straightforward to use and some simple wrapper could be
written for novices. Would this be of interest ? Is it an appropriate way
to handle this sort of data ?

There are a number of format conversion tools out there of varying complexity
and it may be appropriate to adapt one of these rather than start up
something new - but I do like the PBM approach. I'd be willing to put in a
share of the work on something like this.

Robert Jones    jones at think.com

Thinking Machines Corporation  245 First Street  Cambridge  MA 02142

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net