Drosophila genomic sequences from the EDGP now available.

Takis Benos benos at ebi.ac.uk
Sat Aug 23 13:57:39 EST 1997


The European Drosophila Genome Project, a consortium of groups funded by
the European Union to sequence divisions 1-3 of the X chromosome of
Drosophila melanogaster, announces the availability of sequence data
and other resources from its WWW page:


The policy of the EDGP is to deposit raw cosmid sequences on an ftp site
that can be accessed from this page, or directly at:


These sequences are also deposited in the EMBL Nucleic Acid Sequence
data library in its HTGS (High Throughput Genomic Sequences) division.
These HTGS sequences are automatically annotated for coding regions
predicted computationally and are in the submitted/ directory as EMBL
format files. The rawdata/ directory has the unannotated sequences in FASTA
format. The annotated/ directory has sequences after 'intelligent'
annotation - that is a combination of automatic predictions, similarity
matches and human intervention. Note that a HTGS sequence record
retains the same EMBL accession number when the sequence has been
more fully annotated.

A BLAST server can also be accessed from the EDGP homepage. In close
collaboration with the Berkeley Drosophila Genome Project we offer
a BLAST service against various sets of Drosophila sequence - all
Drosophila nucleic acid or protein sequences, Drosophila EST sequences
(Berkeley) or STS sequences (European Drosophila Mapping Project),
P-element insertion sites (Berkeley), Drosophila transpons and repeat
sequences, as well as to the Berkeley (P1) and European (cosmid) genomic

The directory sequence_sets/ on the ftp server provides various data sets
of Drosophila sequences that can be used for training gene prediction
programs and might be interesting to others. These include a coding_sequence
set (1336 sequences), a 'clean' transposable element sequence set
and a set of miscellaneous repetitive sequences. These have been
put together by Takis Benos and Michael Ashburner with the close
collaboration of Suzanna Lewis (Berkeley) and Martin Reese (LBL). Read
the file dros_sequence_sets.README for details of how these sequence sets
were made and for future plans. These curated sequence sets will be
updated whenever necessary.

We thank the Berkeley Drosophila Genome Project for their close
collaboration and Rodrigo Lopez of the EBI for his BLAST server interface.

Questions or comments to: edgp at ebi.ac.uk.

