IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

[SUMMARY]: "What would like to see in a YEAST Database?" (LONG)

Francis Ouellette francis at AZALEA.NLM.NIH.GOV
Mon Oct 11 20:25:11 EST 1993

[SUMMARY]: "What would like to see in a YEAST Database?" (LONG)


WARNING:  This is a long message: save it - read it - think about it
          but don't repost it ... (only parts of it, if you must!)


This is an attempt at a summary (!) of a thread which has been a bit 
all over the place ... it will be sprinkled with some of my
thoughts.  These are personal, and do not reflect the opinions of 
my employer or of the instution I work at.  If readers want the 
full text of the email messages, they can retieve them from the 
stanford gopher server (genome-gopher.stanford.edu).

This will spark more discussions, and (I hope) more collaborations.

But first I would to thank those who took time to participate
in this discussion (in alphabetical order):

Mike CHERRY (MC)                  cherry at cycle.Stanford.EDU
David COORNAERT	(DC)              dacoo at vub.ac.be
Fatima	CVRCKOVA (FC)             FATIMA at aimp.una.ac.at
Mark JOHNSTON (MJ)                mj at sequencer.wustl.edu
Angelos KALOGEROPOULOS (AK)       angelos at igmors.ups.circe.fr
Geoff KORNFELD GK)                geoffk at aix00.csd.unsw.OZ.AU
Patrick LINDER (PL)               linder at urz.unibas.ch
Odile OZIER-KALOGEROPOULOS (OOK)  odile at cgmvax.cgm.cnrs-gif.fr
Linda RILES (LR)                  riles at MSuser.WUstl.EDU
Dean H. SAXE (DHS)                dsaxe at emory.edu
David STILLMAN (DS)               Stillman at bioscience.biology.utah.edu 
Michel WERNER (MW)                werner at jonas.saclay.cea.fr
Cliff ZEYL (CZ)                   b7jm at musicb.mcgill.ca

My original post asked about "yeast database(s)" in the 
following way: 

::What would like to have in it? 

::How would it be organized?

::Where would it run?

::What would like to have in it?:

These are off course far reaching questions!  And the important one 
is probably the content ... and here are some of the suggestions
from me and others:

Strain list (mutants with phenotype description, 
                 with pictures of colonies and/or cells, 
                 and ATCC or ECACC) 
Genetic map/list (Mortimer et al.,)
EMBL/GenBank yeast database 
E-mail directory
Snail Mail directory (from T. Cooper?)
References with abtracts
Riles-Olson cosmid/lambda maps (LR)
LISTA, the list of nuclear encoded-protein coding genes (PL)
Relationships between genes, or phenotypes (OOK)
Methods (OOK)
Metabolic pathways (XDB)
Vector Sequences with markers/restriction map (MW, CZ)
Transcription factor database (GK)

and I am sure we could think of others.

I should point out that this discussion of new databases 
has brought out a few database ideas/realities out 
of the closet: 

(1) A new gopher server for Saccharomyces cereviseae 
    (more info below - MC)
(2) A new version of LISTA , the info on 1400 Saccharomyces 
    nuclear encoded protein coding genes (more info below - PL)
(3) A call for Yeast Interacting gene Database, or YID 
    (more info below - OOK)

::How would it be organized?:     

There have been many interpretation of this idea ... and different 
views as well ... the two extremes being lots of small databases
(AK), to one large, all emcompassing, database (MC,XDB).

Here I would like to state my opinion:

I think that it is difficult think of one person or group managing 
all of the databases mentioned above.  But a central tool to look 
at all the different databases is quite an attaractive idea.  
Gopher could be such a tool.  I think better things will
evolve from gopher (we already see a great expantion of the use of World 
Wide Web), and gopher does have its limitations, but right now, 
it is probably the most "popular" tool that can reach the largest 
number of researchers.

So a place like the gopher server at Stanford (genome-gopher.stanford.edu) 
would be a great repository for all the YEAST-related databases, 
and there is a way to link these via gopher (and WAIS), and Mike Cherry 
has a wonderfull example of this at his gopher server (eg. LISTA3 and the 
Riles-Olson lambda map).  The important concept to keep in mind is for these 
"small databases" to be maintained and owned by the curator of the each 
respective databases.   I have seen (felt?) a worry from some curators of 
small data-sets of loosing the "control" of their database ... if something 
like a gopher eats it up  :-)    

It is important for gopher managers and data_holders to communicate and 
assure that their data is presented in the best way possible to make it
"gopherizable".   Sometimes, just simple things can help. (for example, 
using spaces instead of tabs).  Communication is the key!

Just a plug for mike here ... any info you think would be of interest to 
the yeast community can be sent to:

yeast-curator at genome.stanford.edu

Mike seems to be interested in managing this gopher server, and I would 
therefore encourage curators of yeast_related databases to make things 
available to him, to communicate with him, and to work with him.  I think 
this will serve many yeast biologist world-wide.

::Where would it run?:

If we agree to make the information "gopherizable", then we need not 
worry about portability ... gopher clients run on all (most?) 
platforms.  If you have a Mac, PC (DOS), Unix or VMS machine, there is a 
gopher client for it.  If you have a gopher client, you can access a gopher 
server. Here is an exerp from the gopher FAQ (frequently asked questions):

Q2:  What do I need to access Gopher?

A2:  You will need a gopher "client" program that runs on your local PC
     or workstation
     (note from francis: if you don't know how to set these up, ask your local 

     There are clients for the following systems.  The directory
     following the name is the location of the client on the anonymous
     ftp site boombox.micro.umn.edu ( in the directory

      Unix Curses & Emacs   :  /pub/gopher/Unix/gopher1.12.tar.Z
      Xwindows (athena)     :  /pub/gopher/Unix/xgopher1.2.tar.Z
      Xwindows (Motif)      :  /pub/gopher/Unix/moog
      Xwindows (Xview)      :  /pub/gopher/Unix/xvgopher
      Macintosh Hypercard   :  /pub/gopher/Macintosh-TurboGopher/old-versions *
      Macintosh Application :  /pub/gopher/Macintosh-TurboGopher *
      DOS w/Clarkson Driver :  /pub/gopher/PC_client/
      NeXTstep              :  /pub/gopher/NeXT/
      VM/CMS                :  /pub/gopher/Rice_CMS/ or /pub/gopher/VieGOPHER/
      VMS                   :  /pub/gopher/VMS/
      OS/2 2.0	            :  /pub/gopher/os2/
      MVS/XA                :  /pub/gopher/mvs/

     Many other clients and servers have been developed by others, the
     following is an attempt at a comprehensive list.  

      A Microsoft Windows Winsock client "The Gopher Book"

      A Macintosh Application, "MacGopher".
        ftp.cc.utah.edu:/pub/gopher/Macintosh *

      Another Macintosh application, "GopherApp".
        ftp.bio.indiana.edu:/util/gopher/gopherapp *

      A port of the UNIX curses client for DOS with PC/TCP

      A port of the UNIX curses client for PC-NFS

      A beta version of the PC Gopher client for Novell's LAN Workplace
      for DOS

      A VMS DECwindows client for use with Wollongong or UCX

     * Note: these Macintosh clients require MacTCP.

     Most of the above clients can also be fetched via a gopher client
     itself.  Put the following on a gopher server:

       Name=Gopher Software Distribution.

     Or point your gopher client at boombox.micro.umn.edu, port 70 and
     look in the gopher directory.

     There are also a number of public telnet login sites available.
     The University of Minnesota operates one on the machine
     "consultant.micro.umn.edu" ( See Q3 for more
     information about this.  It is recommended that you run the client
     software instead of logging into the public telnet login sites.  A
     client uses the custom features of the local machine (mouse,
     scroll bars, etc.)  A local client is also faster.


The limitation to gopher access is an Internet connection.  
For the BITNET sites there is a mail version of gopher called "mailgopher".
To get started, send a mail message to:

mailgopher at nusunix1.nus.sg

with this in the body of the text:


and you will get a help document on mailgopher from bitnet.
I have not tried it. Fatima CVRCKOVA has written on the 
yeast newsgroup that it works, although it is probably a pain
to use, and, in her words: 

	"It works and it is a GOOD THING to know about; and
         it is complicated enough to persuade the boss that 
         we really need a direct connection to Internet..."

I would encourage this!

Gopher is just the beginning ... there is much more to come, and
a lot of it will depend on an Internet connection, better get it soon!

So here ends my "sumary" ... what follows are repeats of posts made 
to this discussion that I think warrent repeat.

I am sure this topic is not closed, and there is much more work to do.

regards to all,


| B.F. Francis Ouellette  
| francis at ncbi.nlm.nih.gov   

SPECIAL ADVERTIZING SECTION: (some of the posts that need reposting!)

notes from 

1) Linda Riles
2) Odile Ozier-Kalogeropoulos
3) Patrick Linder
4) Mike Cherry

From: riles at MSuser.WUstl.EDU

	The Olson yeast map database is online as part of the Stanford
yeast genome database via Internet Gopher.
	From a Unix or VMS system:
		gopher genome-gopher.stanford.edu
	From TurboGopher on your Mac, choose the option "Another Gopher"
and enter hostname:
	If you have questions contact Mike Cherry:
		yeast-curator at genome.stanford.edu

I am sending out a newsletter with more detailed information.  If you
would like a copy, e-mail me your postal address.

			Linda Riles
			Mark Johnston's lab


From: odile%FRCGM51.earn at uk.ac.earn-relay
                   Yeast Intergene Database (YID)
Many investigators have identified new genes by synthetic lethal
interactions or by identifying multicopy suppessors which overcome a
mutant phenotype when cloned on a multicopy vector. These methods have
been useful in revealing duplicated genes, identifying unknown
relationships in cellular metabolism and in finding novel regulatory
Informations concerning these genetic informations  are very
dispersed in many publications and consequently difficult to use. I would
like to create a database of Saccharomyces cerevisae genes showing such
properties of complex genetic interactions. This database should prove to
be an interesting tool for many investigators studying gene regulation,
regulatory pathways, complex cellular systems, functional analysis of
unknown open reading frames, etc.
To be most useful, this database should be accessible to the entire
scientific community, and this is best achieved electronically by FTP.
To construct this database, I need to collect information about these
genes and their genetic interactions and this is the reason of my letter. If
you have informations about synthetic lethal interactions or more
generally about interacting genes,and you are interested by this database
please fill the form below and returning it to me by e-mail. I should
greatly appreciated your collaboration.
                                best regards,
                                Odile Ozier-Kalogeropoulos
my mailing address:      CGM/CNRS
                        91190 GIF SUR YVETTE
e-mail address:  Odile at FRCGM51.BITNET
                 Odile at CGMVAX.CGM.CNRS-GIF.FR
telephone:  33 1 69 82 31 61
Fax number: 33 1 69 07 55 39
I.  General information
II.  Description of the gene of interest
III. Description of interacting gene(s)
        A. First interacting gene
                1. Characteristics
                2.  Relationships between the gene of interest (II)
                    and the interacting gene
                3.  Citation information
        B. Second interacting gene
        C, D... Other interacting genes
Wherever possible, please use standard nomenclature to answer questions
concerning the genes. If a question is not applicable to your
genes, answer by writing N.A. in the appropriate space; if the information is
relevant but unknown, write a question mark (?).
For questions with [   ] , inscribe a X in the right [  ] to answer.
If new data become available which would make the database entry more
informative , I urge you to contact me. Thank you.
Your name
e-mail address
Fax number
gene name
cloned   yes [ ]     no [ ]
accession number (if sequenced)
gene product name
sector (metabolic pathway, cell cycle, secretion,...)
If several genes have been found interacting with the gene of interest,
please copy the section III of this form, insert it at the end of the text
and fill out  this section  for each additional  gene (B, C, D....)
gene name
cloned    yes    [ ]        no  [ ]
accession number (if sequenced)
gene product name
sector (metabolic pathway, cycle cell, secretion...)
The association of the mutated [  ] or knocked-out [  ] gene of interest
and the mutated [  ] or knocked-out [  ] interacting gene is lethal
(synthetic lethal genes) [  ]
shows a specific phenotype [   ]        What phenotype ?
others [   ]  Precise
The  mutant [  ] or knocked-out [  ] gene of interest is rescued by the
interacting gene when present on a centromeric plasmid [   ], a multicopy
plasmid [  ], a mutated gene [ ] ?
Other relationships between the gene of interest and the interacting gene
These data are  [  ] published  [ ] in press  [  ] submitted  [ ] in preparation
                [ ] no plans to publish
title of paper
journal                                     volume, first-last pages, year
Please, copy here the section III of the form if several genes have been found
interacting with the gene of interest and fill out this section for each
additional gene (B, C, D....)


From: linder at urz.unibas.ch
Subject: about LISTA3


Here some comments on the yeast databases.

We have putting together a yeast database containing only protein coding
sequences (Mosse et al., Curr. Genetics 23, 66-91).

The main problems are:

	- different sequences have the same name
	- the same sequences have different names
	- the names often correspond not to the list of R.K. Mortimer
	- the sequences often diverge, due to polymorphisms or sequencing

For multiple names we have introduced a  "priority rule": The sequence
which was published first gives the name (as far it is acceptable by the
genetic nomenclature) and the names attributed to allelic sequences are given
as synonyms. This is not always satisfactory and may lead to conflicts 
with the genetic nomenclature. But a rule had to be established, otherwise
we never get out of the jungle of gene names. In the future, synonyms which 
have been given from genetic work have also to be included.

We have put all these elements together and release 3 of ListA (1400 entries)
will be published in the upcomming volume "The Yeasts" (Vol 6 "Yeast 
Genetics" Ed.Wheals, Rose and Harrison, Academic Press). It will also soon be
available on bioftp.unibas.ch. 

The yeast database LISTA3 can be used as a flatfile on any personal computer
or can be installed on mainframe computers. We have the ListA database
connected through the SRS system with the EMBL and Genbank databanks.

At present we are also including pointers to Swissprot, Chromosomes and 
overlaps with neighbouring sequences. The next will be the function of 
the genes.

We are always very grateful for corrections and helpful comments.

For the authors:

Patrick Linder
Dept. of Microbiology
Klingelbergstr. 70
4056 Basel/Switzerland
Fax +41 61 267 21 18
Email linder at urz.unibas.ch


From: cherry at genome.Stanford.EDU (Mike Cherry)
Subject: a yeast genome database

Just a brief note today. We are building a yeast genome database at
Stanford supported by NCHGR that will be a public resource. There will
be announcements very soon about version 0.1 of the database which
will be accessible via gopher, anonymous ftp and World Wide Wed.
We'll have Olson's physical map, Mortimer's genetic info, and pointers
to sequences. However this is just the beginning. We are putting
together a database built on an object-oriented model that will be
accessible via the Internet. Currently we are hoping to support
several different clients that will be available on almost all types
of computers. Different clients will be available to hopefully match
the needs and computer resources of our diverse community. The
database will provide all types of genome information and as much as
possible links of other databases. There will be more announcements in
the coming months as this systems goes online.

For those that need access to the physical map of the clones available
from the ATCC you can currently look to our Gopher server, lets call
this version 0.0 of the database. Please note that this server will be
moving to a new computer that is on order and due any day. Thus we
might be down for a day or two in the near future. So consider this a
prerelease announcement, this means we are not yet in production mode.

You can use gopher to connect with genome-gopher.stanford.edu to
access the very prerelease yeast database. Look for the menu item:
"Saccharomyces Genome <?>". This is a WAIS search index that contains
information from LISTA2 with the addition of the sequence description,
text pictures of the Washington University clones available from ATCC,
plus some additional information about association of some loci to the
physical map. Queries can include for gene symbols, ATCC clone
numbers, or WashU clone numbers. The search modifiers 'and' plus 'not'
and available, 'or' is assumed. The wildcard '*' can also be used at
the end of a search word, sorry but you can not use it at the
beginning of a word.  A simple example follows, for more information
on searching see the "Help Searching the "<?>" database." on the
Gopher menu.

Select the "Saccharomyces Genome <?>" item, you will be asked for
"Words to search for". Enter:   adh*

You will see something like:

 -->  1.  Gene Name  (LISTA) : ADH3.
      2.  Gene Name  (LISTA) : ADH4.
      3.  Gene Name  (LISTA) : ADR1.
      4.  Gene Name  (LISTA) : ADC1.
      5.  Gene Name  (LISTA) : ADH1.
      6.  Contig View of ATCC Clone number 70708.
      7.  Contig View of Washington Univ Clone number 6993.
      8.  Contig View of ATCC Clone number 70813.
      9.  Gene Symbol: adh3.
      10. Gene Symbol: adh2.
      11. Gene Symbol: adh1.

Entries 1 to 5 are from LISTA2, entries 6 to 8 are a text view of
the Olson physical map and entries 9 to 11 is information from Linda
Riles about how a gene was located to the physical map.

A partial listing of item 6 is:

Contig View of ATCC Clone number 70708
Washington Univ Clone number 6364
This clone contains adh3; Determined by hybridization
Partial View of Chromosome XIII (size 900631 bp)
Limits of this partial (23.7%) view are: 321147 to 534795

All this is very experimental. We are very interested in your input on
how we can make this resource more useful for your work. Very soon we
will have much of this information available via anonymous ftp. The
information in the gopher server will also be increasing. The GemStone
network server database with several network clients will hopefully be
available in early 1994.

J. Michael Cherry                       Project Manager, Yeast Genome Database
Stanford Genome Sequencing Center,      Department of Genetics
Stanford University School of Medicine, Stanford, CA 94305-5120
Voice: 415-723-7541  FAX: 415-723-7016  Internet: cherry at genome.stanford.edu

More information about the Yeast mailing list

Send comments to us at biosci-help [At] net.bio.net