Dear colleaguues,
the following shall demonstrate to you the use of the Sequence Retrieval
System (SRS) software (Thure Etzold, Heidelberg) with the LISTA database
( Patrick Linder(1), Reinhard Doelz (2), Marie-Odile Mosse(3), Jaga
Lazowska(3) and Piotr P. Slonimski(3); 1 Dept. of Microbiology, Biozen-
trum,Klingelbergstr. 70,4056 Basel,Switzerland; 2 Biocomputing, Biozen-
trum, Klingelbergstr. 70,4056 Basel,Switzerland; 3 Centre de Genetique
Moleculaire,Laboratoire propre du CNRS associe a l'Universite Pierre et
Marie Curie,F-91190 Gif sur Yvette,France).
Prerequisites:
(1) Get your software manager to install SRS, from ftp.embl-heidelberg.de
in UNIX or VMS operating systems.
(2) Get your software manager to install all necessary databases and in-
dices as described in the manuals, including the LISTA database.
(3) start SRS.
Benefits:
Walk around databases, sequences, links, and gather answers to questions
which you didn't dare asking before (sorry, that was PR).
_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+
EXAMPLE 1
Let us assume you want to know all occurences of TIF1 homologues in the
EMBL database.
The approach is to search TIF1 in LISTA, go to the Homology database of
LISTA on Protein level, and look up the resulting entries in EMBL after
having filtered out all yeast entries.
First, you do a search for TIF1 in SRS:
+-------------------------------------------------------------------------+
| ID [I]: TIF1 |
| Synonym [H]: |
| Definition [D]: |
| separate keys by & (AND), | (OR), or ! (AND NOT) |
| |
| query (set) name [Q]: GE1 select library(s) [S]: @ |
| connect fields by AND (1) or OR (2) [X]: 1 |
| do => ([Do]) abort => ([F10]) |
+-------------------------------------------------------------------------+
Then, you link the hit through a HOMOLOGY database to EMBL. This HOMOLOGY
database is nothing else but a systematic BLAST search (BLAST from the
NCBI, Altschul, Stephen F., Warren Gish, Webb Miller, Eugene W. Myers,
and David J. Lipman (1990). ) versus a non-redundant database. There
are two flavours: LISTAHOP (which is on PROTEIN level) and LISTAHON (which
is on DNA level).
1. query: GE1, set of type "Entry-ID", expr: ([GE-ID: TIF1*])
+--------------------------------------------------------------------------+
| |
| query (set) name [Q]: X1 query expression: |
| GE1 > LISTAHOP > EMBL
| |
| do => ([Do]) abort => ([F10]) |
| |
+--------------------------------------------------------------------------+
The next is slightly tricky but gives a good impression on the power
of the SRS system. You could do it less elegantly but much simpler.
This step is to filter all yeast entries from the previous hit.
1. query: GE1, set of type "Entry-ID", expr: ([GE-ID: TIF1*])
2. query: X1, set of type "Seq-ID", expr: GE1 > LISTAHOP > EMBL
+--------------------------------------------------------------------------+
| |
| query (set) name [Q]: X2 query expression: |
| X1 ! [EMBL-ORG:SACC*] |
| |
| do => ([Do]) abort => ([F10]) |
| |
+--------------------------------------------------------------------------+
so what we do here is that we tell the system to use the results of the
previous query, but filter out all saccharomyces entries.
This query got 37 answers, results example below:
1. entry: EMBL:CEEIF4AM
DE C.elegans mRNA for eIF-4A homologue
2. entry: EMBL:DMEIF4A
DE D.melanogaster gene for eIF-4A eukaryotic translation initiation
DE factor
3. entry: EMBL:DMHELI
DE D.melanogaster RNA helicase mRNA, complete cds.
4. entry: EMBL:DMRM62RH
DE Drosophila melanogaster RM62 mRNA for novel RNA helicase
5. entry: EMBL:DMRNAHEL
DE Drosophila melanogaster RNA helicase gene, complete cds.
6. entry: EMBL:DMVASA
DE D.melanogaster antigen Mab46F11 (vasa) mRNA, complete cds.
7. entry: EMBL:DMVASA2
DE Drosophila melanogaster vasa gene segment 2 (exons 3 to 7)
8. entry: EMBL:ATTIF4A1
DE A.thaliana mRNA for eukaryotic translation initiation factor 4A-1
9. entry: EMBL:ATTIF4A2
...
_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_
EXAMPLE 2
We would like to know which DEAD motif proteins are known in the LISTA
database.
The approach is to start with PROSITE and simply map this to LISTA.
[G] General [O] SetOptions [U] Query [H] Help
+--------------------+
| [Y] RepeatQuery |
| [X] Expression |
| [Q] QueryReport |
| [W] MakeWild |
| [G] Genes |
| [B] GeneHomologies |
| [S] Sequence |+----------------+
| [R] SeqRelated... || [P]>PROSITE |
| [L] Literature || [D] PROSITEDOC |
| [H] SearchLists || [B] BLOCKS |
+--------------------+| [U] EPD |
| [E] ECD |
| [Z] ENZYME |
| [R] REBASE |
+----------------+
+-----------------------------------------------------------------------+
| ID [I]: DEAD |
| Accession [N]: |
| Definition [D]: |
| separate keys by & (AND), | (OR), or ! (AND NOT) |
| |
| query (set) name [Q]: Q1 |
| connect fields by AND (1) or OR (2) [X]: 1 |
| do => ([Do]) abort => ([F10]) |
+-----------------------------------------------------------------------+
We find one entry there. Then we select the 'link' option and the screen
looks like
[G] General [O] EntryOptions [U] Query [H] Help
1. entry: +-----------------+LICASE
DE DEAD-box| [E] ShowEntry |ndent helicases signature.
| [Q] Quit |
| [D] DeleteEntry |
| [C] CopyEntry |+--------------------+ +-----------+
| [L] LinkEntry || [G]>Genes | | [L]>LISTA |
| [S] SearchBuff || [B] GeneHomologies | +-----------+
| [H] SaveBuff || [S] Sequence |
| [X] o TextData || [R] SeqRelated... |
| [Y] o Data || [L] Literature |
| [Z] o Text || [H] SearchLists |
+-----------------++--------------------+
so if we go for G (genes) and L (LISTA) we see at the bottom of the screen
libraries - Mapped to "SWISSPROT" -> 33 entries
libraries - Mapped to "EMBL" -> 38 entries
libraries - Mapped to "LISTA" -> 14 entries
so we go from PROSITE automatically via SWISSPROT and EMBL to LISTA.
The screen looks now like
[G] General [O] LinkOptions [H] Help
1. entry: LISTA:DBP1
RL MOL. MICROBIOL. 5:805-812(1991).
2. entry: LISTA:DBP2
RL MOL. CELL. BIOL. 11:1326-1333(1991).
3. entry: LISTA:DED1
RL J. MOL. BIOL. 152:553-568(1981).
RL NATURE 349:715-717(1991).
4. entry: LISTA:DRS1
RL PROC. NATL. ACAD. USA 89:11131-11135(1992).
5. entry: LISTA:HIS3
RL J. MOL. BIOL. 152:553-568(1981).
6. entry: LISTA:PET56
RL J. MOL. BIOL. 152:553-568(1981).
7. entry: LISTA:PRP5
RL PNAS 87:4236-4240(1990).
8. entry: LISTA:PRP28
RL GENES DEV. 5:629-641(1991).
Navigation mode - coming from "PROSITE:DEAD_ATP_HELICASE", depth: 1
+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+_+
Example 3:
We have done a search for calmodulin and would like to know whether
the protein is also known in LISTA. If so, we need the homologies of this
entry on DNA level, and we want to align all protein sequences of this
search. The result will be a result from a TFASTA run.
The approach is to search calmodulin in SWISSPROT, map this to LISTA, and
look up the entry in the LISTAHON nucleotide database. If everything is
installed as described, we could later on use TFASTA as implemented in the
GCG package (GCG from GCG INc., Madison) to seacrh the result.
We start the search with selecting SWISSPROT and searching calmodulin,
+-------------------------------------------------------------------------+
| ID [I]: |
| Accession [N]: |
| Definition [D]: CALMODULIN |
| Keywords [K]: +-----------------+ |
| Organism [O]: | [S] SWISSPROT | |
| Authors [A]: | [P] PIR | |
| Title [T]: | [E]> EMBL | |
| Reference [R]: | [F] EMBL_NEW | |
| Comment [C]: | [H] GB_NEW | |
| Features [F]: | [N] NRL3D | |
| separate+-----------------+ (OR), or ! (AND NOT) |
| |
| query (set) name [Q]: SQ1 select library(s) [S]: @ |
| connect fields by AND (1) or OR (2) [X]: 1 |
| do => ([Do]) abort => ([F10]) |
+-------------------------------------------------------------------------+
Next, we map the result to LISTA and inspect the result:
2. query: X1, set of type "Entry-ID", expr: SQ1 > LISTA
screen looks like above, and we see
libraries - Mapped to "EMBL" -> 117 entries
libraries - Mapped to "LISTA" -> 5 entries
...done - 5 entries written to set "X1"
1. entry: LISTA:CMD1
RL CELL 47:423-431(1986).
2. entry: LISTA:CMK1
RL EMBO J. 10:1511-1522(1991).
RL J. BIOL. CHEM. 266:12784-12794(1991).
3. entry: LISTA:CMK2
RL EMBO J. 10:1511-1522(1991).
RL J. BIOL. CHEM. 266:12784-12794(1991).
4. entry: LISTA:CMP1
RL EUR. J. BIOCHEM. 204:713-723(1992).
RL MOL. GEN. GENET. 227:52-59(1991).
RL PNAS 88:7376-7380(1991).
5. entry: LISTA:CMP2
RL MOL. GEN. GENET. 227:52-59(1991).
RL PNAS 88:7376-7380(1991).
This doesn't help much. Therefore, we go back with these
to SWISSPROT. Note that this is a very useful thing here;
as we can get the SWISSPROT description of LISTA with a single
operation:
1. entry: SWISSPROT:CALM_YEAST
DE CALMODULIN.
GN CMD1.
2. entry: SWISSPROT:KCC1_YEAST
DE CALCIUM/CALMODULIN-DEPENDENT PROTEIN KINASE TYPE I (EC 2.7.1.123).
GN CMK1.
3. entry: SWISSPROT:KCC2_YEAST
DE CALCIUM/CALMODULIN-DEPENDENT PROTEIN KINASE TYPE II (EC 2.7.1.123).
GN CMK2.
4. entry: SWISSPROT:P2B1_YEAST
DE PROTEIN PHOSPHATASE 2B CATALYTIC SUBUNIT A1 (EC 3.1.3.16) (CALCINEURIN
DE A1) (CALMODULIN-BINDING PROTEIN 1).
GN CNA1 OR CMP1.
5. entry: SWISSPROT:P2B2_YEAST
DE PROTEIN PHOSPHATASE 2B CATALYTIC SUBUNIT A2 (EC 3.1.3.16) (CALCINEURIN
DE A2) (CALMODULIN-BINDING PROTEIN 2).
GN CNA2 OR CMP2.
The 'real thing we need is CMD1, as this seems to be it.
We briefly check the entry in MEDLINE;
[G] General [O] EntryOptions [U] Query [H] Help
1. entry: +-----------------+T
DE CALMODUL| [E] ShowEntry |
GN CMD1. | [Q] Quit |
2. entry: | [D] DeleteEntry |T
DE CALCIUM/| [C] CopyEntry |+--------------------+I (EC 2.7.1.123).
GN CMK1. | [L] LinkEntry || [G] Genes |
3. entry: | [S] SearchBuff || [B] GeneHomologies |
DE CALCIUM/| [H] SaveBuff || [S] Sequence |II (EC 2.7.1.123).
GN CMK2. | [X] o TextData || [R] SeqRelated... |+-------------+
4. entry: | [Y] o Data || [L] Literature || [M]>MEDLINE |
DE PROTEIN | [Z] o Text || [H] SearchLists |+-------------+NEURIN
DE A1) (CAL+-----------------++--------------------+
GN CNA1 OR CMP1.
5. entry: SWISSPROT:P2B2_YEAST
DE PROTEIN PHOSPHATASE 2B CATALYTIC SUBUNIT A2 (EC 3.1.3.16) (CALCINEURIN
DE A2) (CALMODULIN-BINDING PROTEIN 2).
GN CNA2 OR CMP2.
libraries - Mapped to "MEDLINE" -> 3 entries
[G] General [O] EntryOptions [U] Query [H] Help
1. entry: MEDLINE:87028234
2. entry: MEDLINE:87228267
3. entry: MEDLINE:93278279
We could look at the entry now;
UI - 87028234
AU - Davis TN
AU - Urdea MS
AU - Masiarz FR
AU - Thorner J
TI - Isolation of the yeast calmodulin gene: calmodulin is an essential protein.
MH - Amino Acid Sequence
MH - Base Sequence
MH - Calcium/METABOLISM
MH - Calmodulin/*GENETICS/ISOLATION & PURIFICATION
MH - DNA, Fungal/*ISOLATION & PURIFICATION
... up to the abstract (if existing).
We go back and map the one of the previously targeted five entries to
LISTAHON and get
1. entry: LISTAHON:SCCMD1
GN CMD1
HT >fun
HT emb|M14760|SCCMD1 Yeast (S.cerevisiae) CMD1 gene encoding calmodulin,
HT complete cds. >genbank:gb|M14760|YSCCMD1 Yeast (S.cerevisiae) CMD1
HT gene encoding calmodulin, complete cds.
HT Length = 844
Now this is nearly what we want. The real thing were to have a file
of entry names so that we could use this in a TFASTA search. LISTAHON has
links to EMBL, so we go to EMBL and find 31 EMBL sequences which have
homologies to the CMD1 gene on DNA level (there were 131 on protein level
in this case). From the set of entries in EMBL we keep the current set
[G] General [O] LinkOptions [H] Help
1. entry: +------------------+
DE Candida | [E]>ShowEntry |gene, complete cds.
RT "The iso| [L] LinkEntry |ization of a calmodulin-encoding gene
RT (CMD1) f| [B] Back |ngus Candida albicans";
RL Gene 106| [D] DeleteEntry |
2. entry: | [C] CopyEntry |
DE Yeast (S| [T] SelectFields |ne encoding calmodulin, complete cds.
RT "Isolati| [K] KeepSet |odulin gene: Calmodulin is an
RT essentia| [S] SearchBuff |
RL Cell 47:| [H] SaveBuff |
3. entry: | [X] o TextData |
DE A.califo| [Y] o Data |dulin
RT "Structu| [Z] o Text | the Aplysia californica Calmodulin
RT Gene"; +------------------+
RL J. Mol. Biol. 216:545-553(1990).
4. entry: EMBL:DDCAL
DE D.discoideum calmodulin mRNA, partial cds.
RT "Identification of the single gene for calmodulin in Dictyostelium
RT discoideum";
and write a file of entry names:
[G] General [O] SetOptions [U] Query [H] Help
1. query: SQ1, set of type "Seq-ID", expr: ([SQ-DEF: CALMODULIN*])
2. query: X1, set of type "Entry-ID", expr: SQ1 > LISTA
3. query: X2, set of type "Seq-ID", expr: X1 > SWISSPROT
4. query: L1, set of type "Entry-ID", expr: [SWISSPROT-ID: CALM_YEAST] ...
5. query: L2, set of type "Seq-ID", expr: [LISTAHON-ID: SCCMD1] > EMBL
Write file of entry names - filename: L2.FIL
This file of entry names then is searchable in a TFASTA search. We exit
the SRS program and invoke the GCG package. Next, we specify
% tfasta swissprot:calm_yeast @L2.FIL -default
(on VMS:
$ TFASTA SWISSPROT:CALM_YEAST @L2.FIL /DEFAULT
)
... CPU time: 0:00:06
Output File: calm_yeast.tfasta
(Peptide) TFASTA of: calm_yeast from: 1 to: 147 December 15, 1993 11:48
...
TO: @L2.FIL Sequences: 31 Symbols: 26,236 Word Size: 2
The best scores are: frame init1 initn opt..
em_fun:sccmd1 Yeast (S.cerevisiae) CMD1 gene encoding ca...(2) 628 628 628
em_in:slcalmodu Stylonychia lemnae calmodulin gene, comp...(3) 455 455 490
em_in:ptcam P.tetraurelia calmodulin gene, complete cds (1) 453 453 488
em_in:s68025 CAM=calmodulin [Paramecium tetraurelia, Gen...(3) 453 453 488
em_in:tpcalw T.pyriformis mRNA for calmodulin (2) 453 453 489
em_in:ttcalm T.thermophila mRNA for calmodulin (1) 452 452 488
em_in:accalm A.californica mRNA for calmodulin (2) 439 439 479
em_ro:mmcalmod M.musculus mRNA for calmodulin (3) 436 436 475
em_pr:hscalcbp Human calmodulin mRNA, complete cds (1) 436 436 475
em_ov:ggcam Chicken calmodulin (cam) mrna (2) 436 436 475
em_ov:ggcalma Chicken calmodulin mRNA, complete cds (1) 436 436 475
em_ro:rnrcm1 R.norvegicus mRNA for calmodulin (pRCM1) (2) 436 436 475
em_ov:xlcamb X.laevis calmodulin gene, mrna, clone 71 (2) 436 436 475
em_pr:hscam Human calmodulin mRNA, complete cds (2) 436 436 475
em_ro:rncam Rat calmodulin mRNA, complete cds (1) 436 436 475
em_ro:rncama Rat calmodulin mRNA, complete cds (2) 436 436 475
em_pl:mscal1 Alfalfa cal1 mRNA for calmodulin (3) 435 435 471
em_pl:phcalpro Petunia hybrida CAM53 mRNA, complete cds (2) 434 434 470
em_bb:s45905 CaM-A=calmodulin [Oryzias latipes=medaka, m...(2) 423 423 445
em_ov:olcamd O. latipes (killifish) mRNA for calmodulin,...(2) 423 423 445
em_fun:cacmd1 Candida albicans calmodulin gene, complete...(1) 416 416 468
em_in:ddcal D.discoideum calmodulin mRNA, partial cds (1) 407 407 436
em_ro:rncamps Rat calmodulin processed pseudogene, compl...(1) 292 373 410
em_pl:gmcam5 Glycine max calmodulin (SCaM-5) mRNA, compl...(3) 363 363 420
em_ro:rncamii3 R.norvegicus CaMII gene for calmodulin II...(1) 293 293 298
em_ov:ggcam3 Chicken CaM gene encoding calmodulin, exon 3 (3) 172 172 173
em_ov:ggcam4 Chicken CaM gene encoding calmodulin, exon 4 (3) 124 124 124
em_ro:rncamii3 R.norvegicus CaMII gene for calmodulin II...(2) 104 104 128
em_ov:ggcam5 Chicken CaM gene encoding calmodulin, exon 5 (1) 104 104 129
em_ro:rnrcm1 R.norvegicus mRNA for calmodulin (pRCM1) (5) 41 41 52
em_in:accalm A.californica mRNA for calmodulin (3) 36 36 37
em_fun:cacmd1 Candida albicans calmodulin gene, complete...(3) 34 34 36
em_ro:rncamps Rat calmodulin processed pseudogene, compl...(2) 32 32 44
em_bb:s45905 CaM-A=calmodulin [Oryzias latipes=medaka, m...(4) 31 31 39
em_in:s68025 CAM=calmodulin [Paramecium tetraurelia, Gen...(4) 31 31 42
em_in:ptcam P.tetraurelia calmodulin gene, complete cds (4) 31 31 42
em_ov:xlcamb X.laevis calmodulin gene, mrna, clone 71 (6) 29 29 29
em_fun:sccmd1 Yeast (S.cerevisiae) CMD1 gene encoding ca...(4) 29 29 54
em_ov:ggcam2 Chicken CaM gene encoding calmodulin, exon 2 (1) 28 28 30
em_ro:rncamii3 R.norvegicus CaMII gene for calmodulin II...(3) 27 27 27
Ah well, this could then be reloaded into SRS etc etc...
Voila - it is that simple :-)
Regards Reinhard
--
+----------------------------------+-------------------------------------+
| Dr. Reinhard Doelz | RFC doelz at urz.unibas.ch |
| Biocomputing | DECNET 20579::48130::doelz |
|Biozentrum der Universitaet | X25 022846211142036::doelz |
| Klingelbergstrasse 70 | FAX x41 61 261- 6760 or 267- 2078
| CH 4056 Basel | TEL x41 61 267- 2076 or 2247 |
+------------- bioftp.unibas.ch is the SWISS EMBnet node ----------------+
ftp mirror at nic.switch.ch
-----------------------------------------