IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

NAASC Arabidopsis Genome Report - Part 1

Fred Ausubel ausubel at FRODO.MGH.HARVARD.EDU
Fri Oct 7 14:28:40 EST 1994


Dear Arabidopsis Community:

On behalf of the North American Arabidopsis Steering Committee (NAASC), I
am sending a report concerning the proposal to establish a federally funded
Arabidopsis thaliana genome project in the United States.  The report
contains the recommendations of an ad hoc committee representing the
community of Arabidopsis researchers and various government agencies that
was convened by the NAASC and that met in Arlington Virginia on June 8 and
9, 1994.

A major goal of the report is to convince U.S. science policy makers to
appropriate funding for an Arabidopsis Genome Project.  This explains the
following type of phraseology:  "...progress in identifying Arabidopsis
genes should be considered an important strategic component for maintaining
the U.S. preeminence in plant biology."

I would welcome any comments that you might have about the report.

Appendix 2 is being sent as a separate message.

Fred Ausubel
********************************************************************************

North American ARABIDOPSIS Steering Committee Workshop
PROPOSAL FOR AN ARABIDOPSIS THALIANA GENOME PROJECT (ATGP)


Executive Summary

        Barriers that once impeded the identification of genes with
important biological functions are vanishing in the 1990's.  High
through-put genomic sequencing now permits the rapid identification of
large numbers of genes that were previously inaccessible to traditional
genetic analysis.  The rate of gene discovery is now limited only by the
ability of scientists to map and sequence an organism's genome.  Initial
sequencing efforts have focused on so-called model organisms with
relatively small genomes.  The information obtained from these model
genomes is being used to understand gene structure and function in related
organisms.

        Arabidopsis thaliana, a small flowering plant in the crucifer
family, has the smallest genome and the highest gene density so far
identified in a flowering plant.  During the past ten years, Arabidopsis
has become established world-wide as the preferred species for
molecular-genetic studies in the laboratory.  Importantly, because cloned
Arabidopsis genes can be used to identify corresponding genes in all other
plants, continued progress in identifying Arabidopsis genes should be
considered an important strategic component for maintaining the U.S.
preeminence in plant biology.  Genes identified in Arabidopsis will soon
lead to the creation of economically important plants that are more
resistant to pathogen attack, that reduce the use of environmentally toxic
chemicals, that produce foodstuffs with improved nutritional value, or that
yield new kinds of compounds of commercial value.  Increasing our knowledge
of plant genes has almost limitless potential to improve environmental
quality, increase energy production, identify new medicinal compounds, and
enhance our ability to respond to the steady increase in human population
and changing climatic conditions.

        This report contains the recommendations of an ad hoc committee
representing the community of Arabidopsis researchers and various
government agencies that met in Arlington Virginia on June 8 and 9, 1994,
to discuss the feasibility of commencing a federally-funded large scale
Arabidopsis genome project in the United States.  The committee discussed
the impact that an Arabidopsis genome project would have on the progress of
basic plant research as well as on the strategic interests of the United
States as they relate to agriculture, energy and the environment.  The
committee concluded that a large scale Arabidopsis thaliana Genome Project
(ATGP) should commence as soon as possible.  The committee identified
essential features that should be considered in any proposals for the
initiation of an ATGP.  The committee concluded that one or a limited
number of linked Arabidopsis Genome Centers should be established and that
these Centers will serve as important models for other plant genome
projects in the future.  Finally, the Committee recommended that the United
States ATGP be coordinated with a similar effort already underway under the
auspices of the European Community.  The committee recommended that funds
be provided for:

        1.      Completion of the Arabidopsis physical/genetic map and the
creation of sequence-ready clone collections by 1997.

        2.      Pilot sequencing and technology development projects with
the goal of completing 10 megabases of Arabidopsis genome sequence by 1999.

        3.      Subsequent scale-up of pilot projects and complete
sequencing of the 100 megabase Arabidopsis genome by 2004.
Introduction

        The general benefits of genome sequencing are increasingly obvious
as rapid progress is made toward the goal of sequencing complete
chromosomes in other model organisms, such as yeast and Caenohrabditis
elegans (a small nematode).  While classical mutagenesis, genetic analysis
and conventional cloning strategies have uncovered many genes, rough
estimates suggest that no more than 20-25% of an organism's genes can be
identified by classical genetic techniques, even in organisms with a small
fraction of redundant genes.  Plants, including Arabidopsis, generally
exhibit a moderate to considerable redundancy of half or more of their
genes.  For this and other reasons, mutations that interfere with or
eliminate expression of many genes are silent.  Thus direct genome
sequencing is the only sure way of identifying all of an organism's genes.
For plants, it follows that genome sequencing will be required for the
identification of most of the economically important genes.

        Because genome sequencing projects are still relatively expensive,
model organisms have been selected as the initial targets of complete
genome sequencing.  The evolutionary kinships among organisms justify this
approach.  Depending on the function of a gene and how well conserved its
sequence in evolution, at the very least the gene sequences of the model
organism can be used to identify corresponding genes in related species.
Thus the selection of model organisms for full genome sequencing is a
reasonable policy for conserving limited resources, while maximizing
information yield.  Model organisms have been chosen by several criteria,
including the breadth of existing genetic information, small genome size,
and high gene density.  Arabidopsis thaliana was adopted as a model
organism by plant geneticists some years ago because of its small genome
size and rapid reproductive cycle.  At 100 megabases, the Arabidopsis
genome is among the smallest known plant genomes.  It also has a low
repetitive DNA content.

        The wisdom of selecting Arabidopsis as a model organism for higher
plants is becoming increasingly obvious.  Initial sequencing efforts
suggest that the Arabidopsis genome has a very high gene density (~ 1 gene
every 5 kb).  The relatively close relationship among higher plants due to
the fact that they evolved relatively recently in evolutionary time makes
it possible to use sequence information obtained from Arabidopsis to
identify homologous genes in other plants, including agronomically
important species with much larger genome sizes, higher gene redundancy and
a substantially greater content of repetitive sequences.  Arabidopsis
genes, which are often much easier to clone initially than the
corresponding genes of plants with larger genomes, have already been used
to identify and manipulate genes in agronomically important species.
Scientists at Dupont, for example, have used Arabidopsis genes as probes to
clone fatty acid desaturase genes from a variety of oilseed species such as
soybean and canola.  The cloned genes have been modified and reintroduced
into the species of origin to alter the composition of the oil for improved
health benefits.  Arabidopsis has also been the initial experimental
organism for the introduction of bacterial genes that permit genetically
engineered plants to synthesize a biodegradable thermoplastic,
polyhydroxybutyrate.  The gene system was subsequently transferred to
plants that can be used to produce the plastic on an agricultural scale.
It was the ready availability of Arabidopsis mutants, as well as the fact
that Arabidopsis can be genetically manipulated that made this work
possible.  Additional genes which have been cloned from Arabidopsis and
which have potential agronomic value include genes that confer resistance
to bacterial and fungal pathogens, which are involved in the synthesis of
plant hormones, which affect nutritional quality of seeds, and which alter
time of flowering.

        In addition to its potential agronomic importance, Arabidopsis
genome mapping and sequencing work has already benefited and will
increasingly benefit the community of Arabidopsis researchers.  It
presently takes about three person-years on average to clone an Arabidopsis
gene identified by a mutation using map-based cloning techniques.  The
availability of the complete genomic sequence would vastly simplify and
reduce the cost of identifying most Arabidopsis genes.  Although the
short-term cost of sequencing the entire Arabidopsis genome is substantial
(current costs are about $1.00/base, implying a total cost approaching $100
million by project completion), there are long-term savings and benefits
for the entire plant research community in accelerating research.
Moreover, the high gene density of the Arabidopsis genome implies a high
ratio of informative to uninformative sequence, maximizing the return on
the investment of time and resources.  Finally, the information obtained in
sequencing the genomes of other model organisms widely used in biological
research, such as Escherichia coli, yeast, and C. elegans has contributed
greatly to our understanding of the biology of these organisms and clearly
demonstrates the important role that genome projects can play in biological
research.  Equally significant advances in our understanding of plant
biology can be expected from an Arabidopsis genome project.


Workshop summary

        Overview: To assess the feasibility and desirability of a federally
funded Arabidopsis genome project, the North American Arabidopsis Steering
Committee organized and convened a workshop in Arlington Virginia on June
8-9, 1994.  The workshop participants included the elected members of the
North American Arabidopsis Steering Committee.  Representatives from the
National Science Foundation, the U.S. Department of Agriculture, the
Department of Energy, the NIH-sponsored human genome project, and the
European Community were present as observers.  Two scientists involved with
the human genome project were also present as technical advisors.  A list
of participants is given in Appendix 1.

        The general goal of the workshop was to assess progress toward
meeting the goals of mapping and sequencing the Arabidopsis genome and make
specific recommendations to the National Science Foundation to direct
future US efforts in the Multinational Coordinated Arabidopsis Genome
Research Project.  A secondary goal was to outline in general terms the
main issues which should be addressed in future proposals concerning the
development of new or expanded Arabidopsis sequencing centers.

        The workshop commenced with a summary of the recent Arabidopsis
genome conference held at the Cold Spring Harbor's Banbury Center and
discussion of current funding for Arabidopsis genome research.  Mike Bevan
and Chris Somerville presented overviews of the EC sequencing program (ESSA
or European Scientists Sequencing Arabidopsis) and the Michigan State
University cDNA sequencing project, respectively.  Mary Clutter joined the
workshop participants for a brief discussion of Arabidopsis genome research
funding within NSF.  Jen-i Mao and Mark Johnston discussed two different
approaches to sequencing taken at Collaborative Research (the multiplex
approach) and by the C. elegans sequencing group at Washington University
(sequencing machines).  The committee discussed the responses from the
Arabidopsis community to a questionnaire on the Arabidopsis genome project.
Finally, workshop participants discussed the present status and future of
the US Arabidopsis genome project, commencing with a detailed consideration
of the rationale for genome mapping and sequencing and commentary on the
benefits of even the limited effort to date.  The following issues were
discussed in depth:  Should there be an organized Arabidopsis genome
project given the current state of Arabidopsis research?  What is the
relative priority of complete genome sequencing compared to completion of a
physical map, adding more PCR-based mapping markers to the map, or
single-pass cDNA sequencing?  Who should pay for an Arabidopsis genome
project, how should it be organized, how long will it take, and how much
will it cost?  How will a US-funded ATGP be coordinated with ESSA?

        Setting Priorities:  Before the workshop, a questionnaire (see
Appendix 2) designed to obtain feedback from the Arabidopsis community on
the desirability of an Arabidopsis genome project was posted on the
Arabidopsis electronic newsgroup,  More than 20 responses were obtained
which were reviewed and discussed during the course of the workshop.
Although most respondents supported the concept of an ATGP, several
respondents suggested that a high-density genetic map consisting of
PCR-based markers be completed before large scale sequencing be undertaken.
Indeed, the relatively small number of DNA markers and the incomplete
physical map had already been useful to many investigators and that there
had been extremely heavy and immediate demand for the cDNA clones that were
being sequenced at MSU and in France.  The workshop participants agreed
that a high-density genetic/physical map would be of immediate benefit to
the community.  On the other hand, because it takes considerable time to
get a sequencing organization equipped, trained and functioning
efficiently, there was general agreement of workshop participants that it
is essential to begin setting genome sequencing goals immediately and to
initiate pilot sequencing projects in parallel with other aspects of genome
analysis.

        Progress in genome research:  The current efforts in several
laboratories to establish links between the genetic and physical maps of
the Arabidopsis genome greatly facilitates the map-based cloning of genes.
While many mutations and genes have been mapped by the use of restriction
fragment length polymorphism (RFLPs), genetic markers based on the
polymerase chain reaction (PCR) are being developed for Arabidopsis.
Cleaved amplified polymorphic sequences (CAPS) and simple sequence length
polymorphism (SSLPs) markers can be used for rapid mapping of plant
mutations and as a dense set of sequence tagged sites (STSs) for the
construction of a physical map of the Arabidopsis genome using an anchoring
strategy.  In a collaborative effort, investigators at the John Innes
Institute, the University of Pennsylvania and Massachusetts General
Hospital, are developing an overlapping set of YACs covering the entire
genome.  Using newly available YAC libraries, total genome coverage in YACs
is now estimated to be approximately 60-70%; with even greater coverage on
chromosome 4 (about 80%).Furthermore, in preparation for phase one of the
European Scientists Sequencing Arabidopsis (ESSA), restriction mapping of
500 kb of cosmids from the top of chromosome 4 has been completed and
distributed to the participating laboratories.  In addition to facilitating
the cloning of genes identified solely by phenotype, physical mapping of
the genome generates the starting materials for rapid and efficient
sequencing and is a key component of a genome project.

        Another important component of the ATGP is three cDNA sequencing
projects that are underway in Europe, Canada and the US.  The European goal
is to sequence (from both ends) 3000 unique cDNA fragments(expressed
sequence tags or ESTs).  ESSA scientists are also mapping their ESTs to YAC
clones, regardless of whether the YAC clone has been anchored.  Canadian
scientists are planning to map 600 ESTs.  The US project has already
entered 2500 ESTs in publicly available databases and is on the verge of
entering an additional 4000 (these have been sequenced only in one
direction and relatively little effort has been devoted to eliminating
redundancy).  The exact number of different gene transcripts represented
among this collection of ESTs is unknown; hence the fraction of the
estimated 15-16,000 Arabidopsis genes represented in this collection cannot
be determined at present.  The workshop participants concluded that mapping
cDNAs had merit because it facilitates connecting a mapped mutation to its
cognate gene even in the absence of genomic sequence.

        Goals for Arabidopsis genome research:  Workshop participants
agreed that a pilot genome sequencing project should begin immediately.
More specifically, the NAASC recommends that a specific federal program be
developed to support Arabidopsis genome sequencing and associated
technology development with the goal of completion of the entire genomic
sequence by the year 2004.  The following steps should be undertaken to
achieve this goal:

        1.      A call for proposals to conduct pilot Arabidopsis
sequencing projects.  This should be in the form of RFPs to make it
possible to attract proposals from outside the Arabidopsis community.

        2.      Establishment of several sequencing centers with the
short-term goal to obtain 10 megabases of genomic sequence within 3 years
from the start of funding (a similar goal to ESSA).  The participation of
existing DNA sequencing centers, as well as companies with relevant
expertise, is encouraged.  The purpose of these pilot projects will be to
establish the feasibility of and to develop a detailed strategy to complete
the sequencing of the entire Arabidopsis genome.  To achieve
cost-effectiveness, it is not envisioned that this program will fund a
large number of small-scale sequencing projects.  Pilot sequencing projects
should include substantial mapping components, including the goal of
finding and mapping at least 1000 PCR-based markers, to generate the
appropriate templates for sequencing.

        3.      Significant expansion of the pilot sequencing centers to
achieve the goal of completion of the entire sequence by 2004.  It is noted
that this phase will require a substantial commitment of equipment,
supplies, and personnel.

        4.      Although specific goals were not set, workshop participants
emphasized that a key feature of genome research was the development of
methods for the identification of gene function.  Some of the more
promising methodologies for Arabidopsis include antisense mRNA constructs,
co-suppression and transposon tagging.

APPENDIX ONE

Participants:

Dr. Fred Ausubel
Department of Molecular Biology
Massachusetts General Hospital
Boston, MA 02114

Dr. Mike Bevan
John Innes Institute
Colney Lane, Norwich
NR4 7UJ, United Kingdom

Dr. Joanne Chory
Plant Biology Laboratory
Salk Institute
PO Box 85800
San Diego, CA 92186-5800

Dr. Joseph Ecker
Department of Biology
University of Pennsylvania
Philadelphia, PA 19104-6018

Dr. Mark Estelle
Department of Biology
Indiana University
Bloomington, IN 47405

Dr. Nina Fedoroff
Department of Embryology
Carnegie Institution of Washington
115 West University Parkway
Baltimore, MD 21210

Dr. Howard Goodman
Department of Molecular Biology
Massachusetts General Hospital
Boston, MA 02114

Dr. Mark Johnston
Department of Genetics
Washington University School of Medicine
4566 Scott Avenue
St. Louis, MO 63110-1031

Dr. Jen-i Mao
Collaborative Research Inc.
1365 Main Street
Waltham, MA 02154

Dr. David Meinke
Department of Botany
Oklahoma State University
Stillwater, OK 74078

Dr. Chris Somerville
Plant Biology Department
Carnegie Institution of Washington
Stanford, CA 94305-4170

Observers:

Dr. Machi Dilworth
National Science Foundation
402 Wilson Blvd., Rm #685
Arlington, VA 22230

Dr. Ed Kalaikau
Program Director
Plant Genome Program
National Research Initiative Competitive Grants
CRS/USDAAG Box 2241
Washington, D.C. 20250-2241

Dr. Jerome P. Miksche
Director
Office of Plant Genome Research
Agricultural Research Service
U.S. Department of Agriculture
Bldg. 005, BARC-West,
Beltsville, MD 20705

Dr. Robert Rabson
Director
Division of Energy Biosciences
Office of Basic Energy Sciences
U.S. Department of Energy
ER-17, GTN
Washington, D.C. 20545

Dr. Robert Strausberg
National Center for Human Genome Research
National Institutes of Health
9000 Rockville Pike
Bethesda, MD 20892






More information about the Arab-gen mailing list

Send comments to us at biosci-help [At] net.bio.net