IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

Appendix two from the NAASC Genome Report

Fred Ausubel ausubel at FRODO.MGH.HARVARD.EDU
Fri Oct 7 14:32:53 EST 1994

Following is Appendix two from the NAASC Arabidopsis Genome Report:



Dear Arabidopsis Researchers,

On June 8 and 9 the North American Arabidopsis Steering Committee-NAASC,
(current members are: Ausubel, Chory, Estelle, Ecker, Meinke and Nina
Fedoroff -substitute for Gloria Coruzzi, will meet at NSF headquarters
along with members of the Multinational Steering Committee, genome experts
and representative from the federal granting agencies to discuss issues
related to mapping and sequencing of the Arabidopsis genome.  As stated in
"A Long-Range Plan for the Multinational Coordinated Arabidopsis thaliana
Genome Research Project" (NSF document 90-80),  the mission of this project
is to "identify all of the genes by any means and to determine the complete
sequence of the Arabidopsis thaliana genome before the year 2000".

The purpose of the NAASC workshop is to assess the progress in meeting our
short and long term goals in the areas of mapping and sequencing of the
Arabidopsis genome and make specific recommendations to direct future US
efforts for the Multinational Coordinated Arabidopsis Genome Research
Project.  We will produce a short report on the needs for future
genome-related efforts for the US federal funding agencies.

We want everyone to be included in this discussion and so we are soliciting
your input into this report.  Please read over the following questions and
the comment back either to the network or to me directly.  We shall
consider all of your comments at the workshop.

Please read and respond to the following questions:

A.      Do you think that we are ready to begin some level of directed
genome sequencing in the US?

B.      How important is genome sequencing in terms of funding priorities
(vs. placing cDNAs on the map, completion of the physical map, adding more
PCR-based markers to the map, etc.)?

C.      Who should support systematic genome sequencing if it is a big-$ effort?

D.      What impact on Arabidopsis research will be incurred if sequencing
does not begin today (in 2 years; in 5 years,  in 10 years)?

E.      What type of organizational model for genome sequencing would you
support: sequencing centers vs. individual interested labs?

F.      What quality standards would you expect for the sequence: high or
low  accuracy (high accuracy = higher cost)?

G.      Any specific or general comments that you would like to make!

The committee looks forward to your comments on these important issues.


Joe Ecker (for the NAASC)
Univ. of Pennsylvania fax (215) 898-8780


(The numbers below correspond to individual responses to the questionnaire.
Not all respondents answered every question.  All responses were posted on
the network anonymously unless the respondent indicated otherwise.)

A.      Do you think that we are ready to begin some level of directed
genome sequencing in the US?


        1.      After finishing cDNAs

        3.      Presumably the community is interested in obtaining the DNA
sequence because that is one avenue to determining a gene's function.
However, if we invest all our resources in determining sequence, we will
have the sequence of the genes (open reading frames), but we won't know
what they do. I would favor a combination of approaches that simultaneously
improve our ability to assess gene function and that add to the sequence
data base.  As technology improves, sequencing will become more cost
effective.  Perhaps, with the limited resources available, it is best at
this time to delay an all-out effort to sequence the genome until advances
in technology make sequencing more affordable.

        4.      I think there should be some genome sequencing. However
what is needed is more technology development not necessarily more
production sequencing.

        5.      No.  Do detailed map first (linked YACs, SSLPs, CAPS etc.)

        8.      Definitely yes

        9.      No

        10.     Although I am not a member of the Arabidopsis community, I
am actively involved in sequencing as part of the Human Genome Project so
though I'd add my 2-cents to this discussion.  My bottom line is DEFINITELY
YES with a few notes.  The real live cost per base of final sequence at
sequencing centers (ours at the University of Oklahoma, the Washington
University and the Sanger Center) is somewhere between 50 cents and $1.  By
the time an RFA is written and applications are eventually funded, the cost
will be very close to 50 cents/base final sequence (4-5 fold coverage and
accuracy 99.99%) if done in centers dedicated to sequencing.  The cost for
sequencing done in individual research labs is 5-10 fold higher because of
the lack of economy of scale, lack of standardized protocols, and lack of
fully trained personnel dedicated to the task at hand, i.e. meg-a-base
sequencing.  A few facilities such as that set up for C. elegans would be
ideal for large scale sequencing of the Arabidopsis genome at an annual
budget of say $5 million for each center (set up 4 such centers). I also
think this should be done for the human genome as soon as possible as well.
The plan would be that individual groups who have mapped their favorite
regions, containing their favorite gene(s), then would send (or take) their
physical contigs (cosmids/P1's) to a regional center for sequencing. With a
budget of $5 million, cosmids can be sequenced presently at the rate of one
per one to two days via shotgun with 16 ABI sequencers (12 for the initial
shotgun cloning and 4 for closure and error correction.  This approach is
extremely cost effective (as demonstrated by the C. elegans groups) and
would yield 5 million bases of completed sequences per center per year.

        11.     Yes, but not at the expense of any other projects.  This
should only be done with new money.
        12.     Yes, immediately.

        14.     I think it's not the time now to start sequencing the
entire genome of Arabidopsis.  The completion of the physical maps and a
"high density physical map" should have highest priority.  But a limited
sequencing project of about 1 Mb would give us some insight into the genome
structure of Arabidopsis.  There should be enough mapped contigs of this
size by now

        15.     Yes.  It's clear that for many situations a good physical
map is sufficient, however we have the opportunity to create a tremendous
resource for all plant biologists by increasing our resolution of a plant
genome to the level of DNA sequence.  Enough of the genome is covered in
cosmid and lambda contigs, not to mention YACs, to allow full scale
sequencing to begin.  It seems that in many ways the project is at a
critical point, and delaying may eliminate it from high priority category
of the various funding agencies...we may lose the chance to get this off
the ground altogether.  Already it seems that some of the "genome
initiative" momentum of 5 to 7 years ago, when the first generations maps
were being created and published, is being lost...at least in comparison to
yeast, C. elegans, mouse and humans.  In my opinion the community needs to
get behind this now if it is going to proceed.

        16.     It is not a high priority now.

        17.     Yes, but only if funded by new money.
B.      How important is genome sequencing in terms of funding priorities
(vs. placing cDNAs on the map, completion of the physical map, adding more
PCR-based markers to the map, etc.)?


        1.      Placing cDNAs and completing the physical map should be
completed first.

        3.      Mutations are an important and powerful tool that play a
key role in understanding gene functions.  Placing those mutations on a map
is an effective method for determining the number of genes involved, an for
performing additional genetic analysis such as analyzing suppressors.  In
addition, a high resolution genetic map allows map-based cloning.

        I would like to see methods developed that would allow a researcher
to map a mutation (in a few day's work) to a small region corresponding to
a single clone (perhaps a YAC that has been placed on the map).  Thus, in a
few days one could move from a genetic defect to a piece of DNA.  A map of
1 cM resolution would correspond to a marker approximately every 200 kb,
which is about the size of the YACs.  To do this, we need to simultaneously
do two things:  1)  Get a complete physical map 2) Get a high-resolution
and easy to use genetic map - there is some debate over which kind of
marker would be most useful, but general agreement that PCR is faster than
Southern blots.

        If, as an additional part of this, each cDNA that is contained on
each YAC clone had also been identified (and, HOPEFULLY, those cDNA were on
clones with Agrobacterium borders) one could easily move from a genetic
defect to a complementing cDNA.

        Finally, I would like to see (perhaps through a commercial
supplier) a blot with all of the overlapping YAC clones that correspond to
a complete physical map.  This would allow those who have cloned a gene by
homology (using PCR, etc.) to hybridize their gene to the blot, and thus
place the gene on the map. In some cases, these genes will correspond to
existing mutations.  Thus, it is important to have dense PHYSICAL and

        Because sequencing would be more expensive to develop than the
technologies described above, I would prefer that we obtain these things
first, and then do large-scale sequencing later.

        4.      Completion of the physical map should be a very high
priority. Placing cDNAs on the map and adding more markers second.
Directed genome sequencing of interesting areas would be third.

        5.      More important than mapping cDNAs less important than
completion of physical map (#1 priority) and adding more PCR-based markers.

        8.      In my opinion, more important than EST's and RFLPs, but
less important than completion of the physical map.

        9.      A good physical map and a high density genetic map of PCR
based markers are much higher priorities.  As has been stated already in
this forum, we need to know much more about the genes that are involved in
biological processes of interest.  This can most readily be determined
through genetics.  Once genes have been identified, it would be nice to
have sequence already determined.  However, given the probable trade-offs
involved, i.e. fewer dollars for independent research, good genetic and
physical maps will allow more rapid isolation of genes.  Linked clones in
transformation competent vectors would be useful for rescue experiments.

        10.     Sequencing from regions already mapped will yield this
information directly.

        11.     The physical map is the most important, cDNAs are next and
sequencing last.

        12.     Some money should be allocated to several centers at once.

        14.     In my opinion the highest priority should have the mapping
of the Arabidopsis genome, then the sequencing and mapping of cDNA clones.
The genomic sequencing should have third priority. I would prefer full
length sequenced cDNA clones over sequencing the whole genome.

        15.      It's time to sequence.  The map will "fill up" with
additional markers as needed; the marker resolution (both PCR-based and
RFLPs) is dense enough to allow linkage studies and cloning projects to be
carried out.

        16.     Most important: 1)adding more PCR-based markers to the map
2)completion of the physical map 3)placing cDNAs on the map 4)genome
sequencing I believe that this order of priorities would be most helpful so
that the genome project can be of use to researchers who want to start with
a mutation and clone the DNA corresponding to that gene.  At present, this
process is done by chromosome walking or tagging and both are slow and not
so efficient.  In C. Elegans, the cloning of genes became much easier when
complementation by cosmids from the freezer became routine.  Starting from
a mapped YAC is still a lot of work without the finer structure map and
smaller clones.

        17.     Adding more markers (especially PCR-based) and completing
the physical map are much more important than sequencing.  The sequencing
will eat up a lot of resources and will not be useful without the markers.
We cannot wait for the availability of the sequence (which would
immediately facilitate marker identification) before mapping our genes!
C.      Who should support systematic genome sequencing if it is a big-$ effort?


        1.      USDA

        3.      Can NSF support this?  It is a big $ effort - no question
about it.  In fact, cost estimates for this could be obtained from NIH and
other organizations currently funding genome projects.  When the NIH genome
effort was established, assurance was given that the genome money would not
be drawn from pools that fund basic research.  Could NSF offer the same
deal? or is this an either/or situation?

        4.      This is a political question.

        5.      Fed budget line item.

        8.      NSF, NIH, DOE and Walmart too, if they will help

        9.      Is a consortium of federal agencies feasible?

        10.     USDA but this is a political issue.

        11.     It does not matter, but it should only be done with new
money.  No money should be taken from existing programs.

        12.     Probably the USDA.  Save NSF money for "small science".

        14.     USDA. But if there is moneys from other agencies it could
be a concerted effort financed by different agencies.

        15.     I would suggest a special appropriation to USDA and DOE,
jointly, hopefully to take advantage of some of the current DOE genome
center expertise.  Also, it seems that some industries may be interested.
Rumor has it that private money has been discussed to initiate sequencing
the genome of a major crop plant.

        16.     Unclear to me, but other Arabidopsis research funding
should not be decreased to fund this effort.

        17.     If the USDA could get new money that would be great.  An
interagency program with USDA, NSF and DOE may be more practical.  Because
they already have a mechanism set up for the current Tri-agency grants this
should be a possibility at least.  Responder #10 (who is apparently used to
NIH funding levels) does not realize that the $20M effort he/she suggests
USDA fund cost about as much as the entire current USDA budget for
competitive grants in plant biology!

D.      What impact on Arabidopsis research will be incurred if sequencing
does not begin today (in 2 years; in 5 years,  in 10 years)?


        1.      It needs to be begun w/in 3-5 years

        3.      Right now, we probably waste some money by sequencing genes
in individual labs (perhaps some numbers could be obtained - how many
person hours per gene, cost of supplies, number of genes being sequenced
per year).  If the genome were sequenced, this expense would be eliminated.
As time goes on, the cost of this piece-meal sequencing will increase.
So, we should sequence the genome as soon as we can, BUT, not at the
expense of providing the infrastructure that will allow us to assess
function (i.e. genetic, physical maps as described above).

        4.      If a massive mapping of cDNAs starts then the lack of
genomic sequencing will be small. The physical map is also very important,
as physical map is needed for genome sequencing. Because money is not
apparently readily available for genome sequencing of Arabidopsis in the US
the question is moot.

        5.      If not today:  none if not in 2 years:  serious
disadvantage towards other organisms if not in 5 years:  we should work on
some other organism then.

        8.      I will be thoroughly ashamed if we are more than 2-3 years
behind worms-the USDA should be doing this full blast!!!!!!

        9.      As mentioned above, the impact may be negative if started
today.  In 2-5 years technology hopefully will have advanced enough to make
it a more cost-effective project.

        10.     The Arabidopsis community will lack the detailed
information needed for the "real" biology that they want to do.

        11.     If Arabidopsis is to continue to compete with other model
systems such as worms and flies, sequencing should start now.  If
Arabidopsis is viewed only as a model system for plants, sequencing can
start in two years.

        12.     Negative and severe.  A genome effort provides a way to
elevate Arabidopsis to world-class status and should be started now.

        14.     In 2 years: no genomic sequencing program would have little
to no impact in 5 years: mapping and cDNA sequencing projects should be
finished and a genomic sequencing program should be on the way. in 10
years: the complete genomic sequence of Arabidopsis should be completed.

        15.     As I stated above, we may lose a valuable window of
opportunity if we don't start now. Obviously we could still be crawling to
genes in 5 years without this.  We could also lose important information on
genome organization that could indicate the basis for position effects on
transgenes, and ways to get around the position-effect problems.  and more.

        16.     Better to wait 2-5 years as the technology improves and the
effort can be undertaken at lower cost and higher efficiency.

        17.     Little impact on research today, cannot predict effects of
longer delays.

E.      What type of organizational model for genome sequencing would you
support: sequencing centers vs. individual interested labs?


        1.      I think that there should be room for both.

        3.      I would strongly support doing this as cheaply as possible.
Because sequencing technology is relying more and more on automation, the
only cost effective way to do this is through a sequencing center.  It is
better if individual labs are supported to do basic research and to tackle
biological problems than to be bogged down in production sequencing.

        4.      Both, but it depends on the scale. I do not believe a 20
Mb/year rate can be done by a collaboration of labs like the European Union
yeast project.

        5.      Sequencing centers vs. individual interested labs?:
Sequencing centers.

        8.      definitely support individual interested labs-spreads the
money out and helps in other projects too.

        9.      Whichever is the most efficient as determined from the
experience in other organisms.

        10.     Definitely ONLY in sequencing centers.  Let the mapping and
biology be done in individual labs but sequencing is only cost effective if
done in centers.

        11.     Multiple centers will probably be the most efficient.

        12.     Definitely centers. Individual labs could then carry out
physical mapping and other tasks.

        14.     Sequencing centers. individual labs are just a waste of
resources for mega projects.

        15.     A hybrid...individual labs to organize regions to be
sequenced and to take responsibility for accuracy, etc., and centers to
generate the sequence. Clone order, accuracy, etc. would be reviewed by the
lab in charge of a particular region, and they would work in close
association with the sequencing center.

        16.     Sequencing centers (like done with C. elegans)

        17.     If it is done in earnest then it would probably best be
done at centers for cost efficiency.

F.      What quality standards would you expect for the sequence: high or
low  accuracy (high accuracy = higher cost)?


        1.      Fairly high, unless the computer programs used for analysis
are able to make-up for the higher level of errors.

        3.      We need an accuracy high enough that we don't miss a
significant number of open reading frames.  Given that our resources are
probably more limited than in the other model systems, I don't think we
should push for a higher accuracy than they have, and could probably settle
for a lower one (90-95%??).  Also, because of the introns, we will need to
have some information in order to understand the sequence we get out of
this project.  Thus, before any major sequencing effort, we should have the
cDNAs mapped to physical clones.

        4.      Do the cDNAs at high quality, and then the genome sequence
at low quality.

        5.      High accuracy.

        8.      high accuracy will be cheaper in the long run.

        10.     Highly accurate sequences is the most desirable and can be
done.  In fact, accuracy of the actual bases is not as great a problem as
one would suspect. With 4-5 fold coverage the accuracy is 99.99 or 1 error
per 10,000 bases. The real problem is in closing gaps between shotgun
generated contigs not in error correction.  Thus, a 100 kb region with 10
gaps of 100 bases each gap would have 1000 "errors" (because they are
unknown) bases and the sequence would be 99% accurate over 100 kb.  Would
this be useful information?  For some it might be, but then again you might
be missing the more interesting information that would be contained in the
unsequenced gaps.  My preference is for contiguous regions of highly
accurate sequence that can be obtained by 4-5 fold coverage and a
reasonable closure strategy.

        11.     Low accuracy needs to be defined in terms of mistakes/kb
before this can be answered.

        12.     High.  A 5-10 fold redundancy is probably inevitable anyway.

        14.     High accuracy even at higher cost.

        15.     High accuracy will be cheaper in the long run High
accuracy...if we do this "wrong", we'll really regret in the future for
biological, experimental, and political reasons.

        16.     High accuracy will be more efficient in the long run.
Otherwise data may not be interpreted correctly and labs may have to
re-sequence relevant regions.

        17.     A greater number of errors can be accepted in genomic than
in cDNA sequences.  Finding ORFs in genomic sequence is difficult due to
introns anyway.

G.      Any specific or general comments that you would like to make.


        2.      I don't think a good case can be made for directed
sequencing yet.  Higher priority should be a higher density genetic map
(comprised of mostly ESTs and PCR-able SSLPs) and "completion" of the
physical map.  Directed sequencing, if attempted, should definitely *not*
be done like the European yeast project, where it's apparently just a way
to get a bunch of money for your lab without actually having to sequence
very much (e.g. the 50 or so labs that were required to generate the mere
300 kb of chromosome III).  The worm guys seem to have been doing things
pretty close to "right" all along.

        3.      It seems unlikely that the stated goal of complete sequence
by the year 2000 will be reached.  The sequence of yeast will most likely
be finished by 1997, and, given that Arabidopsis is at least 5 times
larger, and that little progress has been made thus far, it seems that
revision of the target date is in order.  Realistically, what are the odds
of getting an agency other than NSF to fund this?  Could we make a case
with NIH?  Are the funds available from NSF??  How about DOE or USDA?  If
funds could be obtained to both sequence the genome, and to obtain physical
maps, genetic maps, cDNA maps, etc., that would be great!  Perhaps the
steering committee could contact other funding agencies to make a case for
sequencing Arabidopsis??  Any efforts along those lines would be much
appreciated by the community as a whole.

        4.      In the "Plan for the Multinational Coordinated Arabidopsis
thaliana Genome Research Project" (NSF document 90-80), it is stated that
the mission of this project is to "identify all of the genes by any means
and to determine the complete sequence of the Arabidopsis thaliana genome
before the year 2000".  This seems quite over ambitious.  That means that
over the next five years there needs to be an average of 20 Mb sequenced
per year.  That is almost two Saccharomyces (yeast) genomes a year, note
that yeast. is not expected to be completed for another two years. To make
an average of 20 Mb/year would have to be a really big money project. Ask
NCHGR what they spend on Waterston's sequencing center per year and
multiply that by at least twenty.

        5.      I find genomic sequencing much more important than cDNA
sequencing.  First, the complete genomic sequence would be the ultimate
map; second, to search for sequence similarities should be fine with
genomic sequence, even without sophisticated coding sequence finders.

        6.      I do not think that a large-scale genome-sequencing project
for Arabidopsis is appropriate at the moment. First, Arabidopsis does not
have a sophisticated genetic infrastructure comparable to either Yeast or
C. elegans (e.g., it is not possible to do gene targeting, gene
replacement, obtaining intragenic or extragenic suppressors in an efficient
way, conclusively prove that a mutational phenotype is null by constructing
deficiency over mutant allele heterozygote, or even to target a tag to the
gene although this latter goal may be achieved soon).  Thus, a complete
sequence will lead to a lot of information that are of dubious value
because neither straight nor reverse genetics is yet capable of telling us
what much of these sequences do in terms of function. Second, as a result
of this blind sequencing project funds will be diverted from more
interesting, and useful, genetic experiments conducted in smaller labs to
larger laboratories geared towards technology. I am not opposed to
sequencing the genome per se; but I am opposed to going for a total genome
sequencing without at least equivalent amount of funds being directed
towards genetic analysis of the biological processes.

        7.      In response to your message about whether the US should
start a genome sequencing program, my opinion from the UK (since we will
not be paying!), is overwhelmingly, yes.  I concede all the caveats that
others have mentioned, e.g. we can't do fancy genetics (yet) like the yeast
people, the physical map isn't complete etc. etc..  However,  I believe
that some sort of genomic sequencing program is necessary now in order to
identify and overcome, through practical experience, the problems inherent
in this type of work and to set things up for a wider effort which must
happen in the next couple of years if the claims of Arabidopsis researchers
are to remain convincing. Despite the organizational faults of the yeast
chromosome III project, some fascinating information came from this work.
And now the organizers presumably know how to organize things better next
time round.   How about a limited program on some well mapped (in the US)
regions of the genome to begin with.  Keep in touch/formalize links with
European genomic sequencers.  Expand if the results are good; presumably
impressive data will convince the funding agencies and the scientific
public.  Expansion should/should not happen depending on what happens in
cDNA mapping/sequencing. Sequencing centers would work best.  Sequence
information should be made public as soon as possible.

        10.     It is now well documented that sequencing is both cost
effective and extremely informative.  If cosmid or P1 contig maps are
available, these regions should be sequence immediately in centers.  It
turns out that we can sequence faster than the mappers can map so go for
it!!  However, the ultimate problems lie in dealing with the data and a
massive computational issue arises, that is something that only now is
being addressed.  How will the final end user, the biologist, be able to
view the data, search for new features, make new discoveries, etc.?

        12.     Please do not consider sequencing the genomes of other
plants.  At 100 megabases, Arabidopsis will be plenty difficult!

        13.      Many labs are currently sequencing their favorite at
genes, this is tedious and is taking time and resources away from doing
scientific experiments.  In addition, there are many grad students whose
experience in plants amounts to  a lot of sequencing and some experience
with Northerns. With this in mind, perhaps there could be some provision
for the (presumably worthwhile) genes in hand to be sequenced  as a prime
goal of any random sequencing project.  I have not thought through the
issues of timing of release of this information, but believe these issues
could be addressed. I feel strongly that current plant biology is hampered
by laborious techniques (transformation of at would be another example, but
the improved in-planta methods may have overcome this burden to a large
extent), and that any field wide financing would be best directed towards
getting the maximum experiments done. that's my two cents worth

        14.     I hope that the next generation of sequencers and
automation will cut the costs for mega sequencing remarkably. Right now
there are projects for the Arabidopsis community that are more important
than genomic sequencing at this point. mapping the genome, saturated
insertion mutagenesis, sequencing of cDNAs.

        15.     Just that if there is a push to make this happen it would
be good to know that the entire North American committee is behind it, and
is willing to lobby for it both within the Arabidopsis community and with
the funding agencies.  while this is a critical piece of research for all
of us, it may be hard to sell.

        16.     Without powerful reverse genetic techniques, the genomic
sequence information obtained will not be that useful.  Effort should be
put into reverse genetic techniques in parallel.   More realistically,
concentrate efforts on a detailed  physical map with many PCR-based
markers.  I do not see Arabidopsis 'falling behind' other model systems
significantly by waiting to sequence the genomic DNA.

        17.     Push the mapping tools and contig assembly first!  If we
can't find where our genes our then we will not be able to interpret the
sequence as effectively.  The contigs are necessary for completion of the
sequence and proper assembly as well.

        18.     I propose that any funded Arabidopsis genomic sequencing
center, being so efficient as they will claim to be in their application,
include as part (at least 1/7) of its mission the directed sequencing of
virtually any cosmid, plasmid, or even multi-kb PCR band that any
Arabidopsis worker sends it.  This would add efficiency to many projects,
and it would cause immediate payoff of the government resources committed
to the center.  The center need not allow itself to be taken advantage of
(for instance re-sequence plasmids looking for mutations) and it would
spread this effort around as much as possible.  The resulting sequences
would be deposited onto the databases without a delay of more than 6
months, or with no delay at all.  This would circumvent the inefficient DNA
sequencing operations that now consume more research resources than they
should.  It might help to get everyone behind the project, since many would
stand to benefit from it early.

        19.     I think that response 18 is a great idea.

        20.     I have read the responses you sent out.  I found I would
agree with comment #17.

More information about the Arab-gen mailing list

Send comments to us at biosci-help [At] net.bio.net