In article <9109301300.AA05395 at lux.think.com> jones at Think.COM (Robert Jones) writes:
>>In the computer graphics field there is a very simple and efficient
>way of handling conversions between the many image formats. The
>Portable Bitmap Toolkit (PBM)provides a single 'central' image format
>along with a set of small unix programs, used as filters, to take TIFF
>files, for example, into the central format and then convert that to,
>say, an X11 bitmap. Various people have contributed filters and the
>collection is impressive. All code is, I believe, public domain - or
>at least freely available. To perform a conversion you simply pipe the
>output of one filter into the input of another.
>>I was thinking about setting up an equivalent for sequence files and
>database extraction could readily fall into this. For someone familiar
>with UNIX it would be straightforward to use and some simple wrapper
>could be written for novices. Would this be of interest ? Is it an
>appropriate way to handle this sort of data ?
> ...
>Robert Jones jones at think.com
I am not so enthusiatic about a portable filter approach for
two reasons
1) The number one concern in searching databases is speed.
Portable filters might easily double the amount of time required to
search a database. Alternately, it may not be possible to implement
them, since FASTA and BLAST need to go back into the database a second
time to reread a subset of sequences. Random access is required.
2) There aren't that many library formats (currently < 10),
and the number is unlikely to increase signficantly. Databases are
published by large organizations that change slowly; it is unlikely
that 20 database formats will ever be required. What is needed is a
simple, efficient to implement, interface to library reading routines.
One of the shortcomings of BLAST is that it only reads its own format
libraries.
Bill Pearson