IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

LISTA2 database available by anonymous FTP

Francis Ouellette francis at ego.psych.mcgill.ca
Fri Jun 11 14:28:42 EST 1993

Dear Yeast_bionetters,

things have been a bit quiet on this group ... many must be having a
good time in Madison at the International Yeast meeting.

I was at the "In silico analysis of Yeast Chromosomes" meeting which
was held in Orsay, France a couple of weeks ago, and one of the many
interesting things that was talked about was the availability by
anonymous FTP of LISTA (version 2).

LISTA is a database (list or flatfile) of all known genes from 
Saccharomyces with their gene name, synonyms, accession number, 
lenght of coding sequence, codon bias, reference and other comments.

It was put together by Patrick Linder, Reinhard Doelz, 
Marie-Odile Mosse, Jaga Lazowska and Piotr P. Slonimski 
(see bellow for their respective addresses)

I enclose the lista2.doc bellow, with a few entries at the end, 
so you have an idea of what it looks like.

The anonymous FTP address is:


(if your networked machine has trouble with that address 
 try this IP address

If you do not know about FTP, I have an FTP_starter_kit that 
can help you to get started,

regards to all,




            A comprehensive compilation of nucleotide sequences 
              encoding proteins from the yeast Saccharomyces

                                  User Manual

                             Release 2, March 1993

       Patrick Linder(1), Reinhard Doelz (2), Marie-Odile Mosse(3), 
              Jaga Lazowska(3) and Piotr P. Slonimski(3)

1 Dept. of Microbiology,Biozentrum,Klingelbergstr. 70,4056 Basel,Switzerland

2 Biocomputing,Biozentrum,Klingelbergstr. 70,4056 Basel,Switzerland

3 Centre de Genetique Moleculaire,Laboratoire propre du CNRS associe a 
  l'Universite Pierre et Marie Curie,F-91190 Gif sur Yvette,France
                             Tel +41 61 -267 21 35
	                     FAX +41 61 - 267 21 18
	                   Email: linder at urz.unibas.ch

               This manual and the database it accompanies may be
               copied  and  redistributed freely, without advance
               permission,  provided  that  this   statement   is
               reproduced with each copy.


This document describes the  format  and  conventions  used  in  this  database, 
a  comprehensive  compilation of nucleotide sequences encoding proteins from the
yeast Saccharomyces. Efforts have been undertaken to make the collected data  as
easily  accessible  as  possible  without  restricting  their usefulness  to   a
particular type of computing environment.  For this reason,the simplest possible 
organisation ("flat file") has been chosen.  It is hoped that users with limited
computing  experience  or  facilities  will  find this organisation easy to work
with, while those requiring a more  complex structure for use with more advanced
tools will find reorganisation straightforward enough to be  done by  a computer

The  continued  development  and  improvement  of  the  database  depends  to  a
significant  degree  on  user  feedback.  A User Report Form for this purpose is
provided at the back of this manual; we hope that you will use it  if  you  find
errors, omissions, or something you think could be done better.This is of impor-
tance particularly in this release, as LISTA Release 3 is currently prepared and
will have new features and additions.  

We would like to stress that both this manual and the database itself  are  free
from  any  copyright  restrictions (please see the statement on the title page).
While we would appreciate acknowledgement if our efforts  have  been  useful  to
you, we want to ensure that the data are freely available to anyone interested.


The amount of nucleotide sequence data is increasing exponentially. We therefore
compiled this genetic database . Each  sequence  has  been  attributed  a single 
genetic name and in the case of allelic duplicated sequences,synonyms are given,
if necessary. Along with the genetic name, the mnemonic from the EMBL  databank,
the codon bias, reference of the publication of the sequence and the EMBL acces-
sion numbers are included in each entry.

The database, as previously described in the literature [1,2] contains  sequence
data assignments from Saccharomyces cerevisiae,Saccharomyces carlsbergiensis and
Saccharomyces uvarum, which are believed  to  constitute  conspecific  taxonomic
species  [3] . Sequences from the unrelated Schizosaccharomyces pombe,  Candida,
Hansenula and others are not included. We also exclude sequences from extrageno-
mic elements like the 2-micron plasmid, mitochondrial DNA, killer sequences  and
from Ty elements. 


The database includes at present a gene name, a synonym in the case the same se-
quence has been published more than once under different names, the mnemonic,the
length of the coding sequence without the stop codon,the codon bias according to
[4] , the reference of the first publication of the sequence, the accession num-
ber and if necessary a commentary. Other items such as the chromosomal localiza-
tion, description of the gene product, cross-reference  to  other  databases and 
adjacent genes will be included in the future. 


Genetic nomenclature relies on the glossary compiled by [5] and was used  where-
ever possible. In many cases, however, no or incorrect  gene  designations  have
been given to published sequences. Moreover,the same name was given to different
sequences or different names have been given to the same sequence. To  sort  out
this problem of nomenclature a priority rule for naming genes in the present da-
tabase  [2] was established. According to this rule the name of the  first  pub-
lished sequence (date of acceptance of the publication) is  used  in  the  list,
provided it is in accordance with the standard genetic nomenclature. Other names
are included as synonyms. In some cases four letter designations (ARGR1, MRPL20)
or gene names followed by a letter (RPL4A, TIF51A)  have also been used. In  the
case of historically well established gene designations such as HO, it was self-
evident that they should be retained. 
Sequences of open reading frames which occur more than once may represent  alle-
lic sequences originating from the same gene or non-allelic sequences from  dup-
licated genes. This database distinguishes between these two cases by  comparing
the 5' and 3' non coding sequences,which in general diverge considerably in non-
allelic duplicated genes but are highly similar or identical in allelic  sequen-
ces. Exceptions have been discussed  [2] . In both  cases,  the  results  of the
comparisons are included in the comment lines.


Each entry in the database is composed of lines.  Different types of lines, each
with  its own format, are used to record the various types of data which make up
the entry. Note that each line begins with a twocharacter line code, which indi-
cates  the type of information contained in the line.  The currently  used  line
types, along with their respective line codes, are listed below.This arrangement
of the database allows an easy integration with other  databanks.  For  example,
links   between   the   LISTA   database  and  the EMBL sequence  database  were
accomplished using the Sequence Retrieval System program  [6] .  

5.1. Gene names and synonymes GN and SY fields 

For the nomenclature  a standard principle for naming gene  sequences  based  on
priority ruleswas used. A simple method to distinguish duplicated  sequences  of
one and the same gene from non-allelic sequences of  duplicated  genes  was  em-

5.2  Sequence Data References DR fields

The nucleotide sequence data are generally quoted in the database  as they  have
been  published in the EMBL database,  subject  to  some  conventions which have
been adopted for the choosen by this database provider.

5.3  Literature References RL fields 

The references cited for  an  entry  should  be  considered  a  pointer  to  the
literature  and  not  as  assigning scientific credit for the elucidation of the
sequence.  Although every effort is made to give complete reference information,
occasionally  only  a  secondary  source has been cited.  This has happened most
frequently in cases where a secondary reference has presented the data in a form
easily  entered.   

5.4. Codon Bias CB field

The codon blas was computed as described in [4]. 

5.5. Length LN field

The length of the sequence given in the DR field is quoted. 

5.6 Summary 

Number of fields	Key	Description 
always 1
(begins each entry)	GN	gene name

0 or more 		SY	synonym

1 or more 
per GN or SY 		DR	EMBL accession number and Mnemonic

1 per DR 		LN	length of sequence

1 per DR		CB	codon bias

1 per DR		RL	Literature reference

0 or more 		CC	additional comments

1 per entry 		//	end of entry 


The LISTA database is available either on diskettes (M.-O- Mosse,Centre de Gene-
tique Moleculaire, CNRS, F-91190 Gif sur Yvette;   mosse at frcgm51.bitnet)  or  by
anonymous FTP from bioftp.unibas.ch [] on the internet. New sequences
and comments on the existing database may be sent to P. Linder                  
(linder at urz.unibas.ch). Release 2 is considered to be preliminary and  is  about
being extended. Further releases,  therefore,  are   still open for feedback and 
suggestions, and it is suggested to contact linder at urz.unibas.ch for comments. 

As Release 2 is of preliminary nature,references in the comment lines (CC) refer
to the publication of the LISTA2 database in [2]. It is anticipated  to  replace
these dependencies in a future release, as well as refine some of the syntax  in
the comment lines. 


This work was supported by grants from the Ministere  de l'Education  Nationale,
the Ligue Nationale contre le Cancer and E.E.C.(to P.S.) and by the Swiss Natio-
nal Science Foundation and Kanton Basel-Stadt (to P.L. and R.D.).


1.	Mosse, M.O., Brouillet, S., Risler, J.L., Lazowska, J. and Slonimski, 
P.P. (1988) Curr. Genet. 14, 529-535.
2.	Mosse, M.-O., Linder, P., Lazowska, J. and Slonimski, P.P. (1993) 
Curr. Genet. 23, 66-91.
3.	Barnett, J.A., Payne, R.W. and Yarrow, D. (1983) (Cambridge 
University Press, Cambridge) 811.
4.	Bennetzen, J.L. and Hall, B.D. (1982) J. Biol. Chem. 257, 3026-3031.
5.	Mortimer, R.K., Contopoulou, C.R. and King, J.S. (1992) Yeast 8, 817-
6.	Etzold, T. and Argos, P. (1993) CABIOS 9, 49-57. 

                                USER REPORT FORM

|                                LISTA database                             |
| User feedback will help us to        | Return to:                         |
| improve the quality of the service   | Patrick Linder                     |
| we provide.  Please use this form    | Klingelbergtrsasse 70              |
| to report errors, omissions,         | CH 4056 BASEL   , Switzerland      |
| suggestions or other comments to us. | linder at urz.unibas.ch               |
| Name:                                | Address:                           |
|--------------------------------------|                                    |
| Telephone:                           |                                    |
|--------------------------------------|                                    |
| Date:                                |                                    |
| Type of report:  [ ]error   [ ]problem   [ ]suggestion   [ ]comment       |
| Release of database to which this report applies:                         |
| Entry or entries affected:                                                |
| Report (please be as precise as possible - attach listings if necessary): |
|                                                                           |
|                                                                           |
|                                                                           |
|                                                                           |
|                                                                           |
|                                                                           |
|                                                                           |
|                                                                           |
|                                                                           |
|                                                                           |
|                                                                           |
|                                                                           |
|                                                                           |
|                                                                           |
|                                                                           |
|                                                                           |
|                                                                           |
|                                                                           |
|                                                                           |
|                                                                           |
|         Continue on further sheets (or Xerox form) if necessary           |


the first 3 LISTA2 entries ...

DR   EMBL; M23166; SCNACT.
LN   2562
CB   0.194
RL   J. BIOL. CHEM. 264:12339-12343(1989).           
SY   NAT1 
DR   EMBL; X15135; SCNAT.
LN   2562
CB   0.194
RL   EMBO J. 8:2067-2075(1989).                      
CC    The name is AAA1 (scnact, accepted 23.1.89), the 
CC   synonym NAT1 (scnat, accepted 23.2.89). Both names 
CC   are present in the list of (Mortimer et al. 1989). The 
CC   sequences are identical. 
DR   EMBL; M12514; SCPET9.
LN   927
CB   0.173
RL   MOL. CELL. BIOL. 6:626-634(1986).               
DR   EMBL; M64706; SCBUB2Q-1.
LN   135
CB   0.219
RL   CELL 66:507-517(1991).                          
CC    The sequences are identical. The reading frame of AAC1 
CC   in scsub2q is partial.
DR   EMBL; J04021; SCAAC2.
LN   954
CB   0.691
RL   J. BIOL. CHEM. 263:14812-14818(1988).           
DR   EMBL; M34075; SCAAC3.
LN   954
CB   0.622
RL   J. BIOL. CHEM. 265:12711-12716(1990).           
CC    The sequences are 94.86% (n) and 99.06% (p) identical. 
CC   The identity in the flanking sequences is 100% and 
CC   97.17% for the 5' and 3' regions, respectively.
| B.F. Francis Ouellette   (old address: francis at monod.biol.mcgill.ca)
| new temporary address:  francis at ego.psych.mcgill.ca

More information about the Yeast mailing list

Send comments to us at biosci-help [At] net.bio.net