searching in comment field

pmr at sanger.ac.uk pmr at sanger.ac.uk
Fri Nov 8 05:47:10 EST 1996

In article <browns02-0611961151380001 at mcrcr1.med.nyu.edu> browns02 at mcrcr.med.nyu.edu (Stuart M. Brown) writes:
>   > In article <browns02-3010961305560001 at mcrcr1.med.nyu.edu>,
>   > Stuart M. Brown <browns02 at mcrcr.med.nyu.edu> wrote:
>   > >We are doing a bunch of BLAST searches with the EST database.
>   > >
>   > >I just got an idea - since these est's are routinely BLASTed
>   > >before they are submitted- and these results are generally noted
>   > >in the comment field, why can't I search the EST database for
>   > >comments that mention BLAST hits against my sequences of interest?
>   > >
>   > >I've tried LOOKUP- but the "All text" field does not appear to
>   > >include the comment field of the GenBank entry.  I also
>   > >tried ENTREZ with only a bit more success - yet I can see mentions
>   > >of my sequence in the comments when I do my own BLAST searches and 
>   > >then read the full annotation of the ESTs that we hit.
>   In article <558p26$pgc at dismay.ucs.indiana.edu>, gilbertd at bio.indiana.edu
>   (Don Gilbert) wrote:
>   > Stuart,
>   > 
>   > This may be a function of how SRS (or the GCG variant Lookup) is
>   > configured at a particular server.  At IUBIo Archive, I revised the 
>   > indexing for SRS to make sure the Genbank comment fields were
>   > searchable.  As a test of your question, I just now searched
>   > the Genbank EST section at IUBIo, searching "Comment" fields
>   > for "BLAST", and found 165 matches.
>   > 
>   Don, You found only 165 ESTs that mention "BLAST" in the comment field???
>   I believe that virtually all ESTs are BLAST searched before being
>   submitted to the database, and I have heard a statistic that about 40% of
>   them find one or more significant hits.  You should get hundred of
>   thousands of hits, no?
>   Where is this information stored if it is not in the comment or
>   description fields??

The information about BLAST hits is generated by NCBI and included in
the dbEST database (in the dbEST.reports file). The BLAST hit
information is updated from time to time, but as far as I am aware it
is not included in the GenBank entries unless it is used to clearly
identify the EST.

The dbEST.reports is available through SRS WWW servers at a number of

Sadly, our copy has a problem at the moment - it went over 2Gb file size
and the SGI system running our SRS WWW server does not like it. I plan to
split the file (SRS will still look the same) this weekend to work
around it.

Meanwhile, I am working on the new SRS 5.0 parsing for dbEST, and will
certainly be trying to index the blast hits in some way. This gets a
little tricky - for example, it is not trivial to combine the scores
and the text for a given hits though it should be possible.

So, an obvious question: what information would you like to search for
in the BLAST hit fields?

An example dbEST.reports entry is included below. Some dbEST entries
have only nucleotide hits, some have only protein hits, and some have
no hits at all. They can have anything up to 15 hits of either type.

dbEST Id:	1
EST name:       EST00001
GenBank Acc:    M61954
GenBank gi:	272204
GDB Dsegment:	D0S2263E

Clone Id:	HHCI89
Source:         ATCC
Id as DNA:	65129
Id in host:	65128
DNA type:	cDNA

Sequencing:     M13 Forward


Entry Created:	May 26 1992 
Last Updated:	May 26 1992 

PUTATIVE ID	Assigned by submitter
                Actin, gamma, cytoskeletal

Lib Name:       Hippocampus, Stratagene (cat. #936205)
Organism:       Homo sapiens
Vector:         lambdaZAP-II
Description:    Female, 2 years; oligo-dT + random primed cDNA synthesis;
                lambdaZAP-II vector, 1.0kb average insert size.

Name:           Kerlavage AR
Lab:            Bioinformatics
Institution:    The Institute for Genomic Research
Address:        9712 Medican Center Drive, Rockville, MD 20850 USA
Tel:		3018699056
Fax:		3018699423
E-mail:		arkerlav at tigr.org

Medline UID:	91262645
Title:          Complementary DNA sequencing: expressed sequence tags and human
                genome project
Authors:        Adams,M.D., Kelley,J.M., Gocayne,J.D., Dubnick,M.,
                Polymeropoulos,M.H., Xiao,H., Merril,C.R., Wu,A., Olde,B.,
                Moreno,R.F., etal
Citation:	Science 252: 1651-6 1991



Top 15 protein matches

Neighbor:       gi|1703 (X60733) gamma non-muscle actin [Oryctolagus cuniculus]
                gi|231506|sp|P29751|ACTB_RABIT ACTIN, CYTOPLASMIC 1
                (BETA-ACTIN). gi|279668|pir||ATRBB actin beta - rabbit
Pvalue:		9.309e-83

Neighbor:       gi|576368|pdb|2BTF|A Beta-Actin-Profilin Complex
Pvalue:		9.309e-83

Neighbor:       gi|537596 (M24769) actin [Xenopus laevis] gi|280660|pir||A43552
                actin - African clawed frog
Pvalue:		9.309e-83

Neighbor:       gi|309090 (J04181) A-X actin [Mus musculus]
                gi|90260|pir||A31900 actin A(X) - mouse
Pvalue:		9.309e-83

Neighbor:       gi|28252 (X00351) beta-actin [Homo sapiens] gi|177968 (M10277)
                cytoplasmic beta actin [Homo sapiens] gi|49866 (X03672)
                beta-actin (aa 1-375) [Mus musculus] gi|55575 (V01217)
                beta-actin [Rattus norvegicus] gi|211237 (L08165) beta-actin
                [Gallus gallus] gi|113270|sp|P02570|ACTB_HUMAN ACTIN,
                CYTOPLASMIC 1 (BETA-ACTIN). gi|71618|pir||ATHUB actin beta -
                human gi|71619|pir||ATMSB actin beta - mouse
                gi|279669|pir||ATCHB actin beta - chicken
Pvalue:		9.309e-83

Neighbor:       gi|28339 (X04098) gamma-actin [Homo sapiens] gi|178043 (M19283)
                gamma-actin [Homo sapiens] gi|57574 (X52815) cytoskeletal
                gamma-actin (AA 1-375) [Rattus rattus] gi|309089 (M21495)
                gamma-actin [Mus musculus] gi|113278|sp|P02571|ACTG_HUMAN ACTIN,
                CYTOPLASMIC 2 (GAMMA-ACTIN). gi|71623|pir||ATHUG actin gamma -
                human gi|71624|pir||ATMSG actin gamma - mouse
                gi|111332|pir||S11222 actin gamma, cytoskeletal - rat
Pvalue:		9.309e-83

Neighbor:       gi|202654 (J00691) cytoplasmic beta actin [Rattus norvegicus]
                gi|71620|pir||ATRTC actin beta - rat
Pvalue:		9.309e-83

Neighbor:       gi|1334642|gnl|PID|e184505 (X07507) actin [Xenopus borealis]
                gi|113271|sp|P15475|ACTB_XENBO ACTIN, CYTOPLASMIC TYPE 1 (BETA
                ACTIN). gi|85691|pir||S01077 actin beta, cytoskeletal - Kenyan
                clawed frog
Pvalue:		9.309e-83

Neighbor:       gi|213273 (M26111) beta-actin [Anser anser]
                gi|113267|sp|P14104|ACTB_ANSAN ACTIN, CYTOPLASMIC BETA.
                gi|627304|pir||A55001 actin beta - goose
Pvalue:		9.309e-83

Neighbor:       gi|63018 (X00182) beta-actin [Gallus gallus]
Pvalue:		9.309e-83

Neighbor:       gi|761724 (U20114) beta-actin [Cricetulus griseus]
                gi|1351867|sp|P48975|ACTB_CRIGR ACTIN, CYTOPLASMIC 1
Pvalue:		9.309e-83

Neighbor:       gi|71621|pir||ATBOB actin beta - bovine (tentative sequence)
Pvalue:		9.309e-83

Neighbor:       gi|71625|pir||ATBOG actin gamma - bovine (tentative sequence)
Pvalue:		9.309e-83

Neighbor:       gi|809561 (X13055) gamma-actin [Mus musculus]
Pvalue:		9.786e-83

Neighbor:       gi|49868 (X03765) put. beta-actin (aa 27-375) [Mus musculus]
                gi|387083 (M12481) cytoplasmic beta-actin [Mus musculus]
Pvalue:		1.029e-82

Top 15 nucleotide matches

Neighbor:       gi|28251|emb|X00351|HSAC07 Human mRNA for beta-actin
Pvalue:		3.325e-149

Neighbor:       gi|28335|emb|X63432|HSACTB H.sapiens ACTB mRNA for mutant
                beta-actin (beta'-actin)
Pvalue:		3.325e-149

Neighbor:       gi|476331|gb|U07786|SSU07786 Sus scrofa beta actin mRNA,
                partial cds.
Pvalue:		2.014e-129

Neighbor:       gi|178044|gb|M16247|HUMACTGAA Human gamma-actin mRNA, partial
Pvalue:		3.857e-129

Neighbor:       gi|28338|emb|X04098|HSACTCGR Human mRNA for cytoskeletal
Pvalue:		6.359e-129

Neighbor:       gi|1702|emb|X60733|OCRNAGNMA O.cuniculus mRNA for gamma-non
                muscle actin
Pvalue:		2.003e-127

Neighbor:       gi|191660|gb|J04181|MUSACTMEL Mouse A-X actin mRNA, complete
Pvalue:		1.144e-123

Neighbor:       gi|49865|emb|X03672|MMACTBR Mouse cytoskeletal mRNA for
Pvalue:		1.202e-123

Neighbor:       gi|191581|gb|M12481|MUSACCYB Mouse cytoplasmic beta-actin mRNA.
Pvalue:		1.030e-121

Neighbor:       gi|49867|emb|X03765|MMACTBR2 Mouse mRNA for cytoplasmatic
                beta-actin (pAL 41; AA 27-375)
Pvalue:		1.698e-121

Neighbor:       gi|213272|gb|M26111|GOOACTB Goose beta-actin mRNA, complete
Pvalue:		2.180e-121

Neighbor:       gi|567191|gb|L36342|MOZBEAC Morone saxatilis (striped bass)
                beta-actin mRNA, partial cds.
Pvalue:		2.655e-120

Neighbor:       gi|211236|gb|L08165|CHKBACTN Gallus gallus beta-actin mRNA,
                complete cds.
Pvalue:		3.392e-118

Neighbor:       gi|57573|emb|X52815|RRGAMACT Rat mRNA for cytoplasmic-gamma
                isoform of actin
Pvalue:		1.184e-117

Neighbor:       gi|51042|emb|X13055|MMGACTR Murine mRNA for cytoplasmic
Pvalue:		1.952e-117
Peter Rice                           | Informatics Division,
E-mail: pmr at sanger.ac.uk             | The Sanger Centre,
Tel: (44) 1223 494967                | Wellcome Trust Genome Campus,
Fax: (44) 1223 494919                | Hinxton, Cambridge, CB10 1SA,
URL: http://www.sanger.ac.uk/~pmr/   | England

More information about the Info-gcg mailing list

Send comments to us at biosci-help [At] net.bio.net