From: wrp at biochsn.acc.Virginia.EDU (William R. Pearson)
Newsgroups: bionet.software
Organization: University of Virginia
[...]
One of the things I would very much like to see would be a
general set of of programs for indexing/extracting sequences from any
standard database format. I find it very frustrating to have to
reformat databases or keep multiple copies, and have taught FASTA about
most popular formats. Unfortunately, although FASTA can read most
libraries, users are often frustrated because the software to extract
sequences is unavailable. I would happy to donate the code that we
use to index PIR/VMS format and Genbank flat file format (two separate
sets of programs) to get the ball rolling.
[...]
Bill Pearson
In the computer graphics field there is a very simple and efficient way of
handling conversions between the many image formats. The Portable Bitmap
Toolkit (PBM)provides a single 'central' image format along with a set of small
unix programs, used as filters, to take TIFF files, for example, into the
central format and then convert that to, say, an X11 bitmap. Various people
have contributed filters and the collection is impressive. All code is, I
believe, public domain - or at least freely available. To perform a
conversion you simply pipe the output of one filter into the input of
another.
I was thinking about setting up an equivalent for sequence files and
database extraction could readily fall into this. For someone familiar with
UNIX it would be straightforward to use and some simple wrapper could be
written for novices. Would this be of interest ? Is it an appropriate way
to handle this sort of data ?
There are a number of format conversion tools out there of varying complexity
and it may be appropriate to adapt one of these rather than start up
something new - but I do like the PBM approach. I'd be willing to put in a
share of the work on something like this.
Robert Jones jones at think.com
Thinking Machines Corporation 245 First Street Cambridge MA 02142