Bermuda Quality Sequences

Rick Wilson rwilson at alu.wustl.edu
Tue Dec 16 08:48:43 EST 1997

rhmorse <rhmorse at genbiosys.com> wrote:

>What is meant by "Bermuda Quality Sequences"?  What is the level of
>quality?  What is the origin of the term?  What data bases or organizations
>recognize this level of quality?
>Randy Morse
>rhmorse at genbiosys.com

In Feb. 1996, the Wellcome Trust organized a conference on the island
of Bermuda to discuss various issues pertaining to large scale human
genome sequencing.  Most non-commercial laboratories with a chance of
contributing significantly to the human genome sequencing effort were
represented, as were major funding agencies and database services from
the U.S., U.K., Europe and Japan.  Among the topics discussed was the
expected quality of human genomic sequence data from the various
participating laboratories.

As originally proposed by Waterston & Sulston, a first cut at the human
"sequence map" would allow for a few well-annotated gaps (e.g. at
tandem repeats, homopolymeric stretches, Alu elements, etc.) and an
average error rate of approximately 1:1000.  The Bermuda participants
agreed that at least for the first few years of large scale human
sequencing that all laboratories should strive for contiguity (i.e., no
gaps) and an average error rate of 1:10,000.  These parameters would
produce a sequence on parallel with that generated for C. elegans and
S. cerevisiae.

Another important topic of the meeting was data release.  Another
feature of the Waterston-Sulston proposal for sequencing the human
genome was immediate release of assembled, unedited data via the
Internet as was currently being done for the C. elegans genome.  A
result of the conference was the evolution of the "Phase I", "Phase
II", "Phase III" categories now seen in GenBank and used for
classifying genomic sequence data.  Phase I data is assembled, but
unedited, Phase II has been partially edited, Phase III is fully
contiguous finished sequence with a projected average error rate of at
least 1:10,000.  The Phase I and Phase II categories indicate
preliminary (i.e., early release) data.  Phase III category data, also
referred to as "Bermuda Quality Sequences", are completed sequences.

Can and will all participating laboratories achieve this level of data
quality?  Currently, the NHGRI is conducting the second installment of
sequence quality assessment in which participating labs exchange the
raw data for a number of genomic clones (cosmids, BACs or PACs) chosen
at random by NHGRI staff.  The raw data is reassembled and edited to
produce a finished sequence which can be compared back to the existing
GenBank or EMBL entry.  With this approach, the major causes of errors
- poor quality raw data and assembly and editing errors - can be
assessed.  The results from this exercise will be utilized in the
review process for NIH competitive renewals.

Hope this answers your questions!


Richard K. Wilson, Ph.D.
Research Associate Professor of Genetics, Co-Director
Genome Sequencing Center
Washington University School of Medicine
4444 Forest Park Blvd., Box 8501
St. Louis, MO   63108   USA
Phone: (314) 286-1804  FAX: (314) 286-1810
rwilson at watson.wustl.edu
www: http://genome.wustl.edu/gsc/staff/rwilson/wilsonhmpg.html

More information about the Autoseq mailing list

Send comments to us at biosci-help [At] net.bio.net