We do not expect a huge rush to instantiate very large records for
genomes and chromosomes which are currently split into multiple
pieces.
For example, the Drosophila genome is comprised of on the order
of 1200 records. Single records for each chromosome (or chromosome
arm) will not replace these until the submitters of the genome,
and NCBI's processing pipeline, are ready to accomodate that change.
My guess is that the first post-June update of the Drosophila genome,
possibly by Fall 2004, might result in single records for each
chromosome/arm (assuming that the submitter desires this).
If so, then the six current CON-division records for Drosophila:
AE014134
AE013599
AE014296
AE014297
AE014135
AE014298
would become 'normal' (non-CON) records, containing sequence data
of up to 28 Mbp .
The area that will see the most immediate change after June 2004
regards bacterial genomes. New complete bacterial genomes will no
longer be broken into pieces, so GenBank Update and Release products
will contain records in the 1 to 5 Mbp range quite soon.
Older bacterial genomes split into pieces over the past several years
will eventually be replaced by single records, but a timetable for this
has not yet been established.
Note that several classes of large sequences already exist:
HTGS Phase 0, 1, and 2; Whole Genome Shotgun contigs; and very
large, dispersed, eukaryotic genes. As far as I know, BLAST
already handles cases like these (up to several Mbp).
But I will inquire with the BLAST group to determine if any software
changes are anticipated for sequences of 100s Mbp. If so, then yes,
we will certainly distribute a BLAST release prior to the distribution
of sequences of that size, and announce that release here and via
other channels.
Reminder: Some sample large records are available at the NCBI FTP site:
ftp://ftp.ncbi.nih.gov/genbank/LargeSeqs
Thanks very much for your inquiry Francis; we encourage any others
with concerns about the length limit removal to ask questions via
this group.
Regards,
Mark Cavanaugh
GenBank
NCBI/NLM/NIH/DHHS
> X-Original-To: genbankb-list at hgmp.mrc.ac.uk> To: genbank at net.bio.net> Date: 26 Apr 2004 03:50:36 +0100
> From: Francis Ouellette <francis at bioinformatics.ubc.ca>
> Subject: Re: GenBank Release 141.0 Now Available
> X-Scanned-By: MIMEDefang 2.36
>>>> Dear Mark,
>> a question via the GenBank newsgroup:
>> As per these anouncements over the last year, the next release of
> GenBank will have unlimited lengths in GenBank record. Do you
> expext this will have much of an impact in the next release? Or do
> you anticipate these longer records to trickle in over time?
>> For example, does NCBI plan to re-release all bacterial genomes by
> next release of GenBank in this unified single record? Will there
> be a public release of BLAST ahead of the next GenBank release?
> (so that those of us who maintain local BLAST servers for our
> communities can have a BLAST server readdy for this new data?)
>> cheers, and thanks for the wonderful work
>> f.
---
- gttaacaattaaagagtgtttatcgaaattcattatatagtggtttatatagaccacttc
-
- GenBank newsgroup see: http://www.bio.net/hypermail/genbankb/
- GENBANKB e-mail: messages sent to genbankb at net.bio.net
- subscribe: e-mail biosci-server at net.bio.net with: subscribe genbankb
- unsub: e-mail biosci-server at net.bio.net with: unsubscribe genbankb
- GenBank on the WWW, see: http://www.ncbi.nlm.nih.gov/Genbank/
- problems with GENBANKB? E-mail moderator: francis at bioinformatics.ubc.ca