IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

SAGE analysis..how to use the database?

Mike Cherry cherry at genome.stanford.edu
Mon Nov 2 08:17:11 EST 1998

> Hello!
> Could anyone tell me how to use the SAGE analysis site in the database.
> If I e.g. search for ACT1 (actin) one result (termed coordinate 54247) says
> that this transcript is found in 0-1 copies pr. cell, whereas another tag in
> the same gene (coordinat53404) shows 81-84 copies pr. cell.
> Also the highly abundant transcript in table 1 in the Velculescu-paper in
> Cell vol. 88 1997 sometimes shows 0-1 copies/cell in the database.
> I am very confused!
> Thomas Neergaard
> e-mail: tbfn at biochem.ou.dk


I assume you are referring to the SAGE pages available from SGD:

The direct help line for any of the services provided by SGD is
yeast-curator at genome.stanford.edu.

Your first question is about the discrepancy of the expression level
between unique tags located in the same gene.  Yes, there are
discrepancies between tags in some genes -- this is experimental data.
You are correct in what you observed from the two unique tags within
ACT1.  One was seen many times, the other was not.  The authors of the
SAGE analysis (Velculescu, et al. Cell 88, 243) collected the tags as
they observed them.  The SAGE project identified the tag and the
number of times it was observed from the three growth conditions.  SGD
took the tags, searched the genome for exact matches and localized the
tag to a position within the genome.  Remember that there are several
steps between the isolation of RNA and the detection of a tag
sequence.  As with any experimental data you should consider exactly
what you are looking at: the strain used by the SAGE project was
YPH499 -- the genome sequence was S288C or FY1679, there may have been
biases for or against some sequences, other biology may be going on,
or in some cases the ORF model currently stated in the database may be
incorrect.  This later situation is not likely for ACT1.  The tag
sequence may be in error, this would only be a consideration for those
rare tags.  If that ACT1 tag, CATGGTCGGTATGG, which was only seen
twice (once in S and once in G2/M) was really CATGGTCTGTATGG it would
hit the gene AQY1, which had no SAGE hits.  The Cell paper suggests
the "sequencing error rate of about 0.7% per base pair".

For your second question, you didn't mention which of the tags from
table 1 of the Cell paper "sometimes" shows low copies.  Below are the
tags from table 1 and their copies/cell.  As you can see none of them
have values of 0 or 1.  Please check your search, if you still find
something other than what is below let me know.

TAG Sequence     L       S      G2/M
==============  ===     ===     ===
CATGGGTGTTAACG  636     561     519
CATGAGACAAACTG  379     229     396
CATGTACCACTCCT  389     268     269
CATGGGTTTCGGTT  269     245     321
CATGTTGCCAGTCT  270     318     247
CATGGGTGAAAACG  350     260     124
CATGATCGCCGCTC  228     233     219
CATGGGTGCTAAGA  247     198     224
CATGTTAGTTTCTA  205     247     127
CATGTCTCTACTGG  223     221     119
CATGGGTTTTGGTT  169     139     249
CATGGGTCCAGCTT  153     112     253
CATGAATCCAGTTG  145     118     151
CATGTTCGTTCACT  182     84      114
CATGAACAGACCAG  83      70      182
CATGCTGCTCTGGG  123     43      139
CATGGCAATACTAC  72      76      148
CATGGCTCTCCCCC  69      111     114
CATGAAAGACAGAG  99      74      119
CATGTGTCGTGGTG  121     74      89
CATGCCAAGGGTAT  67      77      136
CATGTCTCCAGAAG  40      110     130
CATGGTTTTTCTTT  137     78      63
CATGATCACTGGTG  72      51      152
CATGATGAAGGTTC  42      89      142
CATGGTAGAGCCGG  103     70      99
CATGGGTACTGATG  70      58      143
CATGCCAGATTTGT  71      95      104
CATGGTGCCGTCCA  64      39      146
CATGCAAAACCCAA  69      67      105

If you find errors or problems with anything from SGD please let us
know at yeast-curator at genome.stanford.edu.  We are dedicated to
providing accurate and useful interfaces to the collected data we

We will have an expanded version of the SAGE query page in the next
couple of weeks.  This new version will include an expanded search
form allowing searches using the L, S, G2/M levels plus location within
the genome.


J. Michael Cherry                   Internet: cherry at genome.stanford.edu
Department of Genetics              Stanford University School of Medicine
Medical Center, Room M341           Stanford, California  94305-5120
Voice: 650-723-7541                 FAX: 650-723-7016

More information about the Yeast mailing list

Send comments to us at biosci-help [At] net.bio.net