IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

DNA sequencing/info-theory paper in PNAS

S. A. Modena samodena at csemail.cropsci.ncsu.edu
Tue Mar 23 17:08:31 EST 1993

In article <1olvdkINNbun at shelley.u.washington.edu> venk at stein.u.washington.edu (Venkatesh Murthy) writes:
>I came across:
>Sequencing two DNA templates in five channels by digital compression.
>Michael Nelson, Yanping Zhang, David L. Steffens, Reingard Grabherr
>and James L. Van Etten.  PNAS 90:1647-1651 (March 1 1993).
>ABSTRACT: By applying algebraic coding methods to the Sanger dideoxy-
>nucleotide procedure, DNA sequences of two templates can be determined
>simultaneously in only five reactions and data channels.  A 5:2 data
>compression is accomplished.....
>At first glance I see a nice communication channel style diagram,....
>Shannon & Weaver 1949 book, Huffman, Gatlin..... the works!  ...
>-Venki Murthy
>(venk at u.washington.edu)

Well, I'm rather pleased that you mentioned that article.  I read it this
morning.  Thank you for putting me onto it.

Everything in the article is correct and correctly applied (well, I guess
garbage gets into PNAS every once and awhile....)

The gist of the coding scheme is best understood by working out a two lane
example by hand...diagram a two lane sequence gel for an example sequence
such as CGATTCCGATT where the left lane has a mixture of C and G terminator
nucleotides and the right lane has a mixture of A and C terminators.  So
two lanes can decode three nucleotides unambiguiously....but we'd have to
add a third lane to accomodate decoding T...which is a "waste" due to
"unused" channel capacity!  So why not use five lanes?  Lanes 1 and 2 decode
three of four nucleotides of mystery template #1 and Lanes 4 and 5 (using
the same mixtures of terminators) decode the same three-of-four of mystery
template #2 and lane 3 is used to decode the fourth-of-four from mystery
templates #1 and #2.... that's the next example to work out by hand, which
is a fun exercise.

This is not the way you want to do it IF a *human* is going to read the
lanes off the autorad!  But it is extremely well suited to AUTOMATED
sequencing where through put/cost trade-offs are critical in a truely
massive sequencing project.

I might add a side comment or two about one of the "critical readers"
acknowledged: Myron Brakke.  Brakke (if it's the same one  :^) ) is a plant
pathologist who years ago simplified the preparative isolation/purification
of plant virus particles from homogenized leaf material by *inventing*
sucrose density gradient centrifugation.......and I recall from a
conversation with Mathew Messelson how critical the timing of that
invention was to him and Stahl (wasn't it) performing the "famous"
experiment that UNequivically proved the conservative model of replication
for ds-DNA...which if you recall, hinges on exactly the *same* decoding
scheme for interpreting the results as I mentioned above for working
through the two-lane sequencing example.  

Just thought I mention it.  :^)

|     In person:  Steve Modena     AB4EL                           |
|     On phone:   (919) 515-5328                                   |
|     At e-mail:  nmodena at unity.ncsu.edu                           | 
|                 samodena at csemail.cropsci.ncsu.edu                |
|                 [ either email address is read each day ]        |
|     By snail:   Crop Sci Dept, Box 7620, NCSU, Raleigh, NC 27695 |
         Lighten UP!  It's just a computer doing that to you.    (c)


More information about the Plantbio mailing list

Send comments to us at biosci-help [At] net.bio.net