FGENESB - Finding operons and genes in microbial genomes
In the current FgenesB version simple operon prediction model is realized
based on gene distances. It can recognize accurately 70% of single
transcription
units and define exactly about 43% of operons (~92% partially). Increasing
accuracy of operon identification using promoter, terminator
and other features is under development.
We have developed new fgenesB-annotator script that find similar protein
in public databases and annotates predicted genes. This script also can
discover additional low scoring genes if they have knowm homologous protein.
Example of annotation of Bacillus anthracis A2012 main chromosome see at
http://www.softberry.com/bact/ba.ann
FgenesB-annotator: Finding operons and genes in microbial genomes
(Softberry Inc.)
Seq name: gi|20520073|gb|AAAC01000001.1| Bacillus anthracis A2012 main
chromosome, whole genome shotgun sequence
Length of sequence - 5093554 bp Parameters: Bacillus anthracis
Number of predicted genes - 5917, with homology - 5480
Number of transcription units - 3568, operons - 1224
N Tu/Op S Start End Score
1 1 Op 1 + CDS 273 - 953 692 ## MgtC, MgtC
family [Bacillus anthracis A2012] [Bacillus anthr
2 1 Op 2 + CDS 1049 - 2044 625 ##
Similar_to_GB_hypothetical
3 2 Tu 1 - CDS 2031 - 2444 461 ##
Similar_to_GB_hypothetical
4 3 Tu 1 - CDS 2552 - 3904 1599 ## PGI,
Phosphoglucose isomerase [Bacillus anthracis A2012] [Ba
5 4 Tu 1 + CDS 4179 - 4412 393 ##
Similar_to_GB_hypothetical
6 5 Tu 1 - CDS 4525 - 4869 470 ## S1, Ribosomal
protein S1-like RNA-binding domain [Bacillus a
7 6 Op 1 - CDS 5122 - 6312 1010 ##
aminotran_1_2, Aminotransferase class I and II [Bacillus ant
8 6 Op 2 - CDS 6309 - 6806 639 ##
ASNC_trans_reg, AsnC family [Bacillus anthracis A2012] [Baci
9 7 Tu 1 + CDS 6954 - 7916 1144 ## 2-Hacid_DH_C,
D-isomer specific 2-hydroxyacid dehydrogenase,
10 8 Tu 1 + CDS 8026 - 8865 644 ## abhydrolase,
alpha/beta hydrolase fold [Bacillus anthracis A
11 9 Tu 1 - CDS 8895 - 9146 292 ##
Similar_to_GB_hypothetical
12 10 Tu 1 + CDS 9264 - 10415 886 ##
aminotran_1_2, Aminotransferase class I and II [Bacillus ant
13 11 Tu 1 - CDS 10600 - 11097 539 ## sodcu,
Copper/zinc superoxide dismutase (SODC) [Bacillus ant
14 12 Tu 1 - CDS 11208 - 11384 264 ##
Similar_to_GB_hypothetical
15 13 Tu 1 + CDS 11550 - 11933 526 ##
Similar_to_GB_hypothetical
16 14 Tu 1 - CDS 11975 - 12598 605 ## EXOIII,
exonuclease domain in DNA-polymerase alpha and epsil
17 15 Tu 1 + CDS 12888 - 14213 1615 ## ArsB,
Arsenical pump membrane protein [Bacillus anthracis A2
18 16 Tu 1 - CDS 14272 - 14739 418 ##
Similar_to_GB_hypothetical
19 17 Tu 1 + CDS 14858 - 15571 661 ##
Similar_to_GB_hypothetical
20 18 Tu 1 + CDS 15919 - 17295 1497 ##
HGTP_anticodon, Anticodon binding domain [Bacillus anthracis
21 19 Tu 1 - CDS 17333 - 17716 496 ## DUF157,
Uncharacterized protein PaaI, COG2050 [Bacillus anth
22 20 Op 1 + CDS 17812 - 18555 500 ##
Similar_to_GB_hypothetical
23 20 Op 2 + CDS 18606 - 19199 756 ## BioY, BioY
family [Bacillus anthracis A2012] [Bacillus anthr
New FgenesB is the fastest (E.coli genome analyzed in ~14 sec) and most
accurate ab initio Bacterial gene prediction program available.
http://www.softberry.com/berry.phtml?topic=fgenesb
It uses parameters learned for different bacteria by FgenesB-train script,
which input is just new bacterial sequence. It will automatically create
file with gene prediction parameters for the analyzed organism.
It takes only ~10 minutes to create such file for such genome as
E.coli using its sequence. If you need parameters for your new bacteria,
please contact Softberry Inc., we can include them in the WEB list.
Algorithm based on pattern recognition of different types of signals
and Markov chain models of coding regions. Optimal combination of these
features is then found by dynamic programming and a set of gene models
is constructed along given sequencea.
----------------------------------------------------------------------------
----
---