In article <browns02-0611961151380001 at mcrcr1.med.nyu.edu> browns02 at mcrcr.med.nyu.edu (Stuart M. Brown) writes:
> > In article <browns02-3010961305560001 at mcrcr1.med.nyu.edu>,
> > Stuart M. Brown <browns02 at mcrcr.med.nyu.edu> wrote:
> > >We are doing a bunch of BLAST searches with the EST database.
> > >
> > >I just got an idea - since these est's are routinely BLASTed
> > >before they are submitted- and these results are generally noted
> > >in the comment field, why can't I search the EST database for
> > >comments that mention BLAST hits against my sequences of interest?
> > >
> > >I've tried LOOKUP- but the "All text" field does not appear to
> > >include the comment field of the GenBank entry. I also
> > >tried ENTREZ with only a bit more success - yet I can see mentions
> > >of my sequence in the comments when I do my own BLAST searches and
> > >then read the full annotation of the ESTs that we hit.
>> In article <558p26$pgc at dismay.ucs.indiana.edu>, gilbertd at bio.indiana.edu> (Don Gilbert) wrote:
>> > Stuart,
> >
> > This may be a function of how SRS (or the GCG variant Lookup) is
> > configured at a particular server. At IUBIo Archive, I revised the
> > indexing for SRS to make sure the Genbank comment fields were
> > searchable. As a test of your question, I just now searched
> > the Genbank EST section at IUBIo, searching "Comment" fields
> > for "BLAST", and found 165 matches.
> >
> Don, You found only 165 ESTs that mention "BLAST" in the comment field???
>> I believe that virtually all ESTs are BLAST searched before being
> submitted to the database, and I have heard a statistic that about 40% of
> them find one or more significant hits. You should get hundred of
> thousands of hits, no?
> Where is this information stored if it is not in the comment or
> description fields??
The information about BLAST hits is generated by NCBI and included in
the dbEST database (in the dbEST.reports file). The BLAST hit
information is updated from time to time, but as far as I am aware it
is not included in the GenBank entries unless it is used to clearly
identify the EST.
The dbEST.reports is available through SRS WWW servers at a number of
sites.
Sadly, our copy has a problem at the moment - it went over 2Gb file size
and the SGI system running our SRS WWW server does not like it. I plan to
split the file (SRS will still look the same) this weekend to work
around it.
Meanwhile, I am working on the new SRS 5.0 parsing for dbEST, and will
certainly be trying to index the blast hits in some way. This gets a
little tricky - for example, it is not trivial to combine the scores
and the text for a given hits though it should be possible.
So, an obvious question: what information would you like to search for
in the BLAST hit fields?
An example dbEST.reports entry is included below. Some dbEST entries
have only nucleotide hits, some have only protein hits, and some have
no hits at all. They can have anything up to 15 hits of either type.
dbEST Id: 1
EST name: EST00001
GenBank Acc: M61954
GenBank gi: 272204
GDB Dsegment: D0S2263E
CLONE INFO
Clone Id: HHCI89
Source: ATCC
Id as DNA: 65129
Id in host: 65128
DNA type: cDNA
PRIMERS
Sequencing: M13 Forward
SEQUENCE
GCCATCCTGCGTCTGGACCTGGCTGGCCGGGACCTGACTGACTACCTCATGAAGATCCTC
ACCGAGCGCGGCTACAGCTTCACCACCACGGCCGAGCGGGAAATCGTGCGTGACATTAAG
GAGAAGCTGTGCTACGTCGCCCTGGACTTCGAGCAAGAGATGGCCACGGCTGCTTCCAGC
TCCTCCCTGGAGAAGAGCTACGAGCTGCCTGACGGCCAGGTCATCACCATTGGCAATGAG
CGGTTCCGCTGCCCTGAGGCACTCTTCCAGCCTTCCTTCCTGGGCATGGAGTCCTGTGGC
ATCCACGGAACTACCTTCAACTCCATCATGAAGTGTGACGTGGACATTCGGAAAGACCTG
TACGGCAACACAGTGCT
Entry Created: May 26 1992
Last Updated: May 26 1992
PUTATIVE ID Assigned by submitter
Actin, gamma, cytoskeletal
LIBRARY
Lib Name: Hippocampus, Stratagene (cat. #936205)
Organism: Homo sapiens
Vector: lambdaZAP-II
Description: Female, 2 years; oligo-dT + random primed cDNA synthesis;
lambdaZAP-II vector, 1.0kb average insert size.
SUBMITTER
Name: Kerlavage AR
Lab: Bioinformatics
Institution: The Institute for Genomic Research
Address: 9712 Medican Center Drive, Rockville, MD 20850 USA
Tel: 3018699056
Fax: 3018699423
E-mail: arkerlav at tigr.org
CITATIONS
Medline UID: 91262645
Title: Complementary DNA sequencing: expressed sequence tags and human
genome project
Authors: Adams,M.D., Kelley,J.M., Gocayne,J.D., Dubnick,M.,
Polymeropoulos,M.H., Xiao,H., Merril,C.R., Wu,A., Olde,B.,
Moreno,R.F., etal
Citation: Science 252: 1651-6 1991
MAP DATA
NEIGHBORS
Top 15 protein matches
Neighbor: gi|1703 (X60733) gamma non-muscle actin [Oryctolagus cuniculus]
gi|231506|sp|P29751|ACTB_RABIT ACTIN, CYTOPLASMIC 1
(BETA-ACTIN). gi|279668|pir||ATRBB actin beta - rabbit
Pvalue: 9.309e-83
Neighbor: gi|576368|pdb|2BTF|A Beta-Actin-Profilin Complex
Pvalue: 9.309e-83
Neighbor: gi|537596 (M24769) actin [Xenopus laevis] gi|280660|pir||A43552
actin - African clawed frog
Pvalue: 9.309e-83
Neighbor: gi|309090 (J04181) A-X actin [Mus musculus]
gi|90260|pir||A31900 actin A(X) - mouse
Pvalue: 9.309e-83
Neighbor: gi|28252 (X00351) beta-actin [Homo sapiens] gi|177968 (M10277)
cytoplasmic beta actin [Homo sapiens] gi|49866 (X03672)
beta-actin (aa 1-375) [Mus musculus] gi|55575 (V01217)
beta-actin [Rattus norvegicus] gi|211237 (L08165) beta-actin
[Gallus gallus] gi|113270|sp|P02570|ACTB_HUMAN ACTIN,
CYTOPLASMIC 1 (BETA-ACTIN). gi|71618|pir||ATHUB actin beta -
human gi|71619|pir||ATMSB actin beta - mouse
gi|279669|pir||ATCHB actin beta - chicken
Pvalue: 9.309e-83
Neighbor: gi|28339 (X04098) gamma-actin [Homo sapiens] gi|178043 (M19283)
gamma-actin [Homo sapiens] gi|57574 (X52815) cytoskeletal
gamma-actin (AA 1-375) [Rattus rattus] gi|309089 (M21495)
gamma-actin [Mus musculus] gi|113278|sp|P02571|ACTG_HUMAN ACTIN,
CYTOPLASMIC 2 (GAMMA-ACTIN). gi|71623|pir||ATHUG actin gamma -
human gi|71624|pir||ATMSG actin gamma - mouse
gi|111332|pir||S11222 actin gamma, cytoskeletal - rat
Pvalue: 9.309e-83
Neighbor: gi|202654 (J00691) cytoplasmic beta actin [Rattus norvegicus]
gi|71620|pir||ATRTC actin beta - rat
Pvalue: 9.309e-83
Neighbor: gi|1334642|gnl|PID|e184505 (X07507) actin [Xenopus borealis]
gi|113271|sp|P15475|ACTB_XENBO ACTIN, CYTOPLASMIC TYPE 1 (BETA
ACTIN). gi|85691|pir||S01077 actin beta, cytoskeletal - Kenyan
clawed frog
Pvalue: 9.309e-83
Neighbor: gi|213273 (M26111) beta-actin [Anser anser]
gi|113267|sp|P14104|ACTB_ANSAN ACTIN, CYTOPLASMIC BETA.
gi|627304|pir||A55001 actin beta - goose
Pvalue: 9.309e-83
Neighbor: gi|63018 (X00182) beta-actin [Gallus gallus]
Pvalue: 9.309e-83
Neighbor: gi|761724 (U20114) beta-actin [Cricetulus griseus]
gi|1351867|sp|P48975|ACTB_CRIGR ACTIN, CYTOPLASMIC 1
(BETA-ACTIN).
Pvalue: 9.309e-83
Neighbor: gi|71621|pir||ATBOB actin beta - bovine (tentative sequence)
Pvalue: 9.309e-83
Neighbor: gi|71625|pir||ATBOG actin gamma - bovine (tentative sequence)
Pvalue: 9.309e-83
Neighbor: gi|809561 (X13055) gamma-actin [Mus musculus]
Pvalue: 9.786e-83
Neighbor: gi|49868 (X03765) put. beta-actin (aa 27-375) [Mus musculus]
gi|387083 (M12481) cytoplasmic beta-actin [Mus musculus]
Pvalue: 1.029e-82
Top 15 nucleotide matches
Neighbor: gi|28251|emb|X00351|HSAC07 Human mRNA for beta-actin
Pvalue: 3.325e-149
Neighbor: gi|28335|emb|X63432|HSACTB H.sapiens ACTB mRNA for mutant
beta-actin (beta'-actin)
Pvalue: 3.325e-149
Neighbor: gi|476331|gb|U07786|SSU07786 Sus scrofa beta actin mRNA,
partial cds.
Pvalue: 2.014e-129
Neighbor: gi|178044|gb|M16247|HUMACTGAA Human gamma-actin mRNA, partial
cds.
Pvalue: 3.857e-129
Neighbor: gi|28338|emb|X04098|HSACTCGR Human mRNA for cytoskeletal
gamma-actin
Pvalue: 6.359e-129
Neighbor: gi|1702|emb|X60733|OCRNAGNMA O.cuniculus mRNA for gamma-non
muscle actin
Pvalue: 2.003e-127
Neighbor: gi|191660|gb|J04181|MUSACTMEL Mouse A-X actin mRNA, complete
cds.
Pvalue: 1.144e-123
Neighbor: gi|49865|emb|X03672|MMACTBR Mouse cytoskeletal mRNA for
beta-actin
Pvalue: 1.202e-123
Neighbor: gi|191581|gb|M12481|MUSACCYB Mouse cytoplasmic beta-actin mRNA.
Pvalue: 1.030e-121
Neighbor: gi|49867|emb|X03765|MMACTBR2 Mouse mRNA for cytoplasmatic
beta-actin (pAL 41; AA 27-375)
Pvalue: 1.698e-121
Neighbor: gi|213272|gb|M26111|GOOACTB Goose beta-actin mRNA, complete
cds.
Pvalue: 2.180e-121
Neighbor: gi|567191|gb|L36342|MOZBEAC Morone saxatilis (striped bass)
beta-actin mRNA, partial cds.
Pvalue: 2.655e-120
Neighbor: gi|211236|gb|L08165|CHKBACTN Gallus gallus beta-actin mRNA,
complete cds.
Pvalue: 3.392e-118
Neighbor: gi|57573|emb|X52815|RRGAMACT Rat mRNA for cytoplasmic-gamma
isoform of actin
Pvalue: 1.184e-117
Neighbor: gi|51042|emb|X13055|MMGACTR Murine mRNA for cytoplasmic
gamma-actin
Pvalue: 1.952e-117
--
------------------------------------------------------------------------
Peter Rice | Informatics Division,
E-mail: pmr at sanger.ac.uk | The Sanger Centre,
Tel: (44) 1223 494967 | Wellcome Trust Genome Campus,
Fax: (44) 1223 494919 | Hinxton, Cambridge, CB10 1SA,
URL: http://www.sanger.ac.uk/~pmr/ | England