New Ab initio annotation of sequences in Drosophila genome:
(20622 Genes and 75935 exons)
available for public usage at
httpd://www.softberry.com/inf/infodb.shtml
Search in Drosophila proteins:
httpd: //www.softberry.com/scan/scan.shtml
Recently, the nucleotide sequence of nearly all euchromatic portion of the
Drosophila genome (~120 MB) has been determined (Adams et al., 2000).
We annotated these sequences (at httpd: /genomic.sanger.ac.uk/inf/infodb.shtml)
predicting genes using Fgenesh program and checking similarity of each exon
with the EST and protein databases using Blast program (Altshul et al.,1977).
Later some additional sequencing and sequence improvements were provided.
WE REPEAT AB INITIO PREDICTION ON IMPROVED SEQUENCES and annotated exons
by PfamA domains. The results of this analysis are presented in Table 1
and can be seen in the InfoGene database at httpd: //www.softberry.com/inf/infodb.shtml.
In this table we present SETS of GENES AND EXONS with removing (filtering out)
most unreliable genes in addition to computer predicted genes.
We use 2 criteria: 1) Remove genes with total length of protein coding
region less then 30 amono acids and 2) Remove genes with total score of exons < 15.
Such filtering was proved to be useful to improve the accuracy of prediction
(Salamov, Solovyev, 2000, Genome Res.,10,516-522). We should note that 20622
genes includes some pseudogenes and genes of mobile elements.
The Blast/DBscan search against the predicted drosophila proteins is provided at this site:
httpd: //www.softberry.com/scan/scan.shtml.
The sequences of exons and gene annotation data can be copied from
httpd: //www.softberry.com/inf/dro_ann.shtml for using them to
create microarray oligos:
Table 1. Summary of predicted genes and proteins in Drosophila genome sequences
X 2L 2R 3L 3R 4 Y Unknown Total
Size (MB) 22.2 23.0 21.4 24.1 28.3 1.2 0.02 4.6 124.8
Genes predicted 4071 4610 4573 4851 4962 133 1 691 24884
filtered 3349 3768 3915 4017 4962 105 1 504 20622
United/PfamA dom 1138 1193 1287 1216 1654 58 0 76 6622
Interesting that this ab initio predictions by FGENESH produced about 7 thousands
more than annotated by Celera scientists (after filtering) and academia coauthors.
We however do not remove genes of mobile elements. Because any gene prediction approach
is not perfect it will be useful to analyze all different predictions to identify new genes.
---