Genes/Operons in pathogenic organisms: Mycobacterium tuberculosis, Yersinia pestis

Victor Solovyev softberry at softberry.com
Tue Nov 19 15:36:39 EST 2002

Genes/Operons in pathogenic organisms: Mycobacterium tuberculosis, Yersinia
pestis and others

Applying Softberry fgenesB-annotator script that predicts genes and find
proteins in public databases, we present annotations for several pathogenic


Mycobacterium tuberculosis H37Rv, complete genome
Mycobacterium tuberculosis CDC1551, complete genome
Yersinia pestis strain CO92, complete genome
Yersinia pestis KIM, complete genome
Bacillus anthracis A2012 main chromosome

 Example of annotation of Yersinia pestis KIM

Prediction of potential operons and genes in microbial  genomes
 Time:   Mon Nov 18 11:07:36 2002
 Seq name: gi|22123922|ref|NC_004088.1| Yersinia pestis KIM, complete genome
 Length of sequence - 4600755 bp
 Number of predicted genes - 4011, with homology - 3927
 Number of transcription units - 2364, operons – 799

N      Tu/Op   Conserved  S             Start         End    Score
1     1 Op  1   2/0.311   -    CDS         21 -       461    375  ## COG0716
2     1 Op  2     .       -    CDS        554 -      1015    362  ## COG1522
Transcriptional regulators
3     2 Tu  1     .       +    CDS       1185 -      2177   1148  ## COG2502
Asparagine synthetase A


New FgenesB is the fastest (E.coli genome analyzed in ~14 sec) and most
accurate ab initio Bacterial gene prediction program available.


It uses parameters learned for different bacteria by FgenesB-train script,
which input is just new bacterial sequence. It will automatically create
file with gene prediction parameters for the analyzed organism.
It takes only ~10 minutes to create such file for such genome as
E.coli using its sequence. If you need parameters for your new bacteria,
please contact Softberry Inc., we can include them in the WEB list.

Algorithm based on pattern recognition of different types of signals
and Markov chain models of coding regions. Optimal combination of these
features is then found by dynamic programming and a set of gene models
is constructed along given sequences.

In the current FgenesB version  operon prediction model is realized
based on gene distances. It can recognize accurately 70% of single
transcription units and define exactly about 43% of operons (~92%


More information about the Microbio mailing list

Send comments to us at biosci-help [At] net.bio.net