On Thu, 15 Apr 1999 George Armhold (armhold at topside.rutgers.edu) wrote:
> PS: Is anyone else bothered by the BLAST docs mentioning that the output
> is supposed to be non-parseble?
What is parsable today, may not be tomorrow! (I think they are
giving us all a heads-up)
The GenBank flat-file is not really parsable either, but many people
do it (ncbi does it!) ... But it's not the richest/best format for
storing the information stored within these records. For BLAST there
are better (richer, fulller, more compact) alternatives which is what
I think they are inferring. There is an ASN.1 version of the blast
output (from whichj you can generate various reports: a graphical view
and/or a text view).
Everytime you parse a GB flat-file you loose some information,
structure that was in the original ASN.1 file, which is the format
that NCBI maintains all of the sequence data that we use on a regular
basis, and for which the GBFF is simply a report, a human readable
format that everybody _loves_ to parse! (see the wonderful chapter by
Ostell and Kans on the 'NCBI data model' in "Bioinformatics: a
practical guide to the analysis of genes and proteins" edited by
Baxevanis and Ouellette.
(blatent plug for our book, but presented in the friendly
discussion spirit of this newsgroup ;)
cheers,
f.
--
| B.F. Francis Ouellette tel: (604) 875-3815 |
| Director, Bioinformatics Core Facility fax: (604) 875-3800 |
| Centre for Molecular Medicine and Therapeutics, UBC, Canada |
|francis at cmmt.ubc.cahttp://www.cmmt.ubc.ca |
Canadian Bioinformatics Workshop Series:
http://www.cmmt.ubc.ca/bioinformatics/