IUBio

searching in comment field

Stuart M. Brown browns02 at mcrcr.med.nyu.edu
Mon Nov 11 16:31:38 EST 1996


> >   I believe that virtually all ESTs are BLAST searched before being
> >   submitted to the database, and I have heard a statistic that about 40% of
> >   them find one or more significant hits.  You should get hundred of
> >   thousands of hits, no?
> >   Where is this information stored if it is not in the comment or
> >   description fields??
> 
> The information about BLAST hits is generated by NCBI and included in
> the dbEST database (in the dbEST.reports file). The BLAST hit
> information is updated from time to time, but as far as I am aware it
> is not included in the GenBank entries unless it is used to clearly
> identify the EST.
> 
> The dbEST.reports is available through SRS WWW servers at a number of
> sites.
> 
> Meanwhile, I am working on the new SRS 5.0 parsing for dbEST, and will
> certainly be trying to index the blast hits in some way. This gets a
> little tricky - for example, it is not trivial to combine the scores
> and the text for a given hits though it should be possible.
> 
> So, an obvious question: what information would you like to search for
> in the BLAST hit fields?

This is really agonizing.  Here is all of this beautiful data, but apparently no
good way to use it.  I don't think that the SRS indicies should be expanded
to include 15 protein and 15 nucleotide hits (and their names and the 
significance level of each hit).  It already takes us anywhere from 6 to 36
hours to recreate the SRS indicies on our GCG system after each full GenBank
updaate (and this is on a fast Alpha machine!).  Perhaps the time is ripe for
a new tool - sort of a reverse BLASTer that takes a given sequence and 
identifies all EST's that mention that sequence in their BLAST report.  

Think about it - here is all of this information about EST's, but unless you
already know the accession # of a particular EST, then you will never see it -
so everyone has to do the BLAST against the EST's for themselves without
knowing 
that the EST's have already been compared with their sequence.

-- 
Stuart M. Brown, Molecular Biology Consultant 
NYU-MC Research Computing Resource, Dept. of Cell Biology
550 First Ave, New York, NY 10016
Phone: (212)263-7689  FAX: (212)263-8139



More information about the Info-gcg mailing list

Send comments to us at biosci-help [At] net.bio.net