Merging blast output files

Steven Smith ssmith at itis.com
Thu Apr 27 08:41:25 EST 2000

You do need to be careful when comparing e-values.  They take into account
the size of the database searched.  The e-value means "I expect to find X
hits when searching this sequence with a database of size Y." So e-values
are comparable when Y's match.

If one search was on full Genbank(big), and the other was just on the
update(small), then set the expected database size to be equal to size of
Genbank for the update search and the e-values will be (more) comparable.
To be exactly comparable, I think you would need to set the db size to the
total of Genbank+Update for both searches, but lets not get too picky.

Steve Smith
  Genesmith Informatics

Bob Friedman wrote:

> The program MSPCrunch may do what you want.  However, if the blast searching
> is also batched, you can specify the e-value cutoffs in the blast client,
> blastcl3 or blastall.  In the scenario that you want to update the e-value
> for the updated search, well, this value should be accurate given the search
> space and lengths of the sequences; the computed e-value doesn't need to be
> "fixed".
> Bob
> "Gary Williams"
> > Has anyone got a good way of taking two blast output files (e.g.  one
> > from a search done on the previous release of the full Genbank/EMBL and
> > the second one from a search of the updated entries in the latest
> > release) and merging them so that the statistics (especially E value)
> > are updated accordingly.

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net