Francis Ouellette francis at CMMT.UBC.CA
Fri Apr 16 09:13:06 EST 1999

On Thu, 15 Apr 1999 George Armhold (armhold at topside.rutgers.edu) wrote:

> PS: Is anyone else bothered by the BLAST docs mentioning that the output
> is supposed to be non-parseble?

What is parsable today, may not be tomorrow!  (I think they are
giving us all a heads-up)

The GenBank flat-file is not really parsable either, but many people
do it (ncbi does it!) ... But it's not the richest/best format for
storing the information stored within these records. For BLAST there
are better (richer, fulller, more compact) alternatives which is what
I think they are inferring. There is an ASN.1 version of the blast
output (from whichj you can generate various reports: a graphical view
and/or a text view).

Everytime you parse a GB flat-file you loose some information,
structure that was in the original ASN.1 file, which is the format
that NCBI maintains all of the sequence data that we use on a regular
basis, and for which the GBFF is simply a report, a human readable
format that everybody _loves_ to parse! (see the wonderful chapter by
Ostell and Kans on the 'NCBI data model' in "Bioinformatics: a
practical guide to the analysis of genes and proteins" edited by
Baxevanis and Ouellette.  

(blatent plug for our book, but presented in the friendly
discussion spirit of this newsgroup ;)



| B.F. Francis Ouellette                     tel: (604) 875-3815  | 
| Director, Bioinformatics Core Facility     fax: (604) 875-3800  | 
| Centre for Molecular Medicine and Therapeutics, UBC, Canada     |
| francis at cmmt.ubc.ca                     http://www.cmmt.ubc.ca  |

Canadian Bioinformatics Workshop Series:

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net