FGENESH-2 Improving gene finding accuracy by Using similar genomic regions of
2 organisms (as Human and Mouse) that contain homologues genes.
The program of FGENESH type for predicting multiple genes in genomic DNA sequences
using HMM gene model is available for public usage at:
http://www.softberry.com/gfs.html
Ab initio gene prediction programs usually correctly predict significant fraction of exons in a gene, but they often
assemble gene in incorrect way: combine several genes or split one gene into several, skip exons or include false exons.
Using 2 organisms information can significantly improve accuracy of EXACT gene finding taking into accunt that
Human genome draft sequence and Mouse genomic sequence provide a lot of homologous sequences.
Program shows Predicted genes in both sequences as 2 sequential Fgenesh outputs.
EXAMPLE of output for genes predicted in Human and Mouse genomic sequences:
EXAMPLE of output for genes predicted in Human and Mouse genomic sequences:
Organism: h Given similarity: 96
FGENESH-2 1.C Prediction of potential genes in 1st genomic DNA
Time: Fri Nov 10 02:55:51 2000
Seq name: HSCKIIBE
Length of sequence: 5917 GC content: 53 Zone: 3
Number of predicted genes 1 in +chain 1 in -chain 0
Number of predicted exons 6 in +chain 6 in -chain 0
Positions of predicted genes and exons:
G Str Feature Start End Score ORF Len
1 + 1 CDSf 1634 - 1705 18.99 1634 - 1705 72
1 + 2 CDSi 2672 - 2774 38.26 2672 - 2773 102
1 + 3 CDSi 3344 - 3459 41.09 3346 - 3459 114
1 + 4 CDSi 3906 - 3981 25.73 3906 - 3980 75
1 + 5 CDSi 4128 - 4317 67.44 4130 - 4315 186
1 + 6 CDSl 4645 - 4735 29.35 4646 - 4735 90
1 + PolA 4855 0.92
Predicted protein(s):
>FGENESH-2 1 6 exon (s) 1634 - 4735 215 aa, chain +
MSSSEEVSWISWFCGLRGNEFFCEVDEDYIQDKFNLTGLNEQVPHYRQALDMILDLEPDE
ELEDNPNQSDLIEQAAEMLYGLIHARYILTNRGIAQMLEKYQQGDFGYCPRVYCENQPML
PIGLSDIPGEAMVKLYCPKCMDVYTPKSSRHHHTDGAYFGTGFPHMLFMVHPEYRPKRPA
NQFVPRLYGFKIHPMAYQLQLQAASNFKSPVKTIR
FGENESH-2 1.C Prediction of potential genes in 2nd genomic DNA
Time: Fri Nov 10 02:55:51 2000
Seq name: MMGMCK2B
Length of sequence: 7874 GC content: 51 Zone: 2
Number of predicted genes 1 in +chain 1 in -chain 0
Number of predicted exons 6 in +chain 6 in -chain 0
Positions of predicted genes and exons:
G Str Feature Start End Score ORF Len
1 + 1 CDSf 2169 - 2240 38.64 2169 - 2240 72
1 + 2 CDSi 2829 - 2931 28.70 2829 - 2930 102
1 + 3 CDSi 4112 - 4227 36.45 4114 - 4227 114
1 + 4 CDSi 4615 - 4690 18.76 4615 - 4689 75
1 + 5 CDSi 4801 - 4990 56.00 4803 - 4988 186
1 + 6 CDSl 6262 - 6352 18.70 6263 - 6352 90
1 + PolA 6470 0.92
Predicted protein(s):
>FGENESH-2 1 6 exon (s) 2169 - 6352 215 aa, chain +
MSSSEEVSWISWFCGLRGNEFFCEVDEDYIQDKFNLTGLNEQVPHYRQALDMILDLEPDE
ELEDNPNQSDLIEQAAEMLYGLIHARYILTNRGIAQMLEKYQQGDFGYCPRVYCENQPML
PIGLSDIPGEAMVKLYCPKCMDVYTPKSSRHHHTDGAYFGTGFPHMLFMVHPEYRPKRPA
NQFVPRLYGFKIHPMAYQLQLQAASNFKSPVKTIR
---