Parsing framework (BlastXML), was Re: FASTA format - proposed

Andrew Dalke dalke at bioreason.com
Tue Nov 24 21:46:09 EST 1998

Wayne Parrott <wayne at workingobjects.com> said:
> The relevance of BlastXML to Andrew's post is that it defines an event
> handler known as the BlastElementHandler (i.e., a callback interface).
> BlastElementHandler defines the event notification interface used by a
> Blast parser when it recognizes a significant element. By subclassing
> from BlastElementHandler developers will be able to map event/callback
> data to their own object-model(s). 

I haven't responded thusly to a post in a long time but ...


(Okay, I'm over that.)

> While XML is the underlying stream protocol, the upper layers deal
> only with Blast result semantics, e.g., database, score, HSP, ... 

The difficulty we had with that was when working with people who've
been staring at BLAST output for years.  They expect to see a certain
layout.  By filtering through only the semantic data, it seemed to
be a harder sell.  YMMV.  And I suppose for the farther future you
would have a, CSSL is the name?, to render the XML back to the current
layout.  But then I'll probably have to learn Scheme...

[shifting over to your company's web page]
> WorkingObjects has extended the Blast algorithm to include 
> the generation of XML formatted results. 

  Ahh, I think see what you did.  Think you'll convince NCBI to
generate to your DTD?  It would make automated data extraction
a lot easier.

> I know Perl reigns in bioinformatics but since the majority of my work
> is in Java I naturally chose it for implementation of the initial
> version. A Perl implementation of BlastXML is being planned - pending a
> better understanding of bioperl components. 

  My implementation was in Perl with ideas borrowed from XML/SGML
parsing, but for the life of me I couldn't figure out Perl5 objects
well enough to describe them to the other developers, so it was
done with anonymous hashes and namespace functions.  (Yes, I realize
that's close to Perl5 objects -- probably missing a bless or
something -- but it's easier to explain.)

  Anyway, my preference for this sort of work these days is Python.
(After all, I'm sitting here with one of my Python t-shirts on :)
There are several XML parsers available and one or two DOMs for
the C version.  The Java implementation of Python can access all
the existing Java libraries, which can also be handy.  Take a look
at http://www.python.org and http://www.python.org/jpython .

						Andrew Dalke
						dalke at bioreason.com

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net