Advanced version of FGENES multiple gene prediction
Victor Solovyev
solovyev at sanger.ac.uk
Sat Nov 21 14:44:03 EST 1998
We installed the FGENES 1.6 of multiple gene prediction program
It is available at http://genomic.sanger.ac.uk/
Computational Genomic Group WEB server
(http://genomic.sanger.ac.uk/gf/gf.html)
It works significantly better than the older one.
Enclosed the results of accuracy for the standard Guigo test
and test for long or multiple human genes and example of
100% true prediction for 32906 bases
Human IFNAR gene for interferon alpha/beta receptor.
Data on exon level describe exact exon prediction,
on nucleotide level account for partial predicted exons also.
Guigo dataset of 570 genes:
===================================
Fgenes 1.6:
ALL EXONS: OBSERVED - 2663 EXACTLY PREDICTED: 2233 84%
(averaged over all genes)
Sne- 82.7 Spe- 82.0 Sn_n- 91.9 Sp_n 93.1 C- 0.92
no prediction cases - 1
Init: Observed - 570 Predicted - 576 Correct - 470 82%
Intr: Observed - 1523 Predicted - 1548 Correct - 1311 86%
Term: Observed - 570 Predicted - 567 Correct - 452 79%
Sngl: Observed - 0 Predicted 7 Correct - 0
Genescan:
ALL EXONS: OBSERVED - 2663 EXACTLY PREDICTED: 2166 81%
Sne- 77.7 Spe- 80.8 Sn_n- 93.1 Sp_n 92.8 C- 0.92
no prediction cases - 8
Init: Observed - 570 Predicted - 449 Correct - 369 65%
Intr: Observed - 1523 Predicted - 1688 Correct - 1366 90%
Term: Observed - 570 Predicted - 487 Correct - 431 76%
Sngl: Observed - 0 Predicted 3 Correct - 0
Sne - sensitivity on the exon level; Spe - specificity on the exon level
Sn_n - sensitivity on the nucleotide level; Sp_n - specificity on the
nucleotide level
The dataset of 38 human genomic sequences:
(19 genes 20000 -240000 bp long + 19 multiple gene sequences)
=============================================================
Fgenes 1.6:
ALL EXONS: OBSERVED - 705 EXACTLY PREDICTED: 590 84%
(averaged over all exons)
Sne- 83.7 Spe- 68.3 Sn_n- 92.0 Sp_n 75.9 C- 0.84
no prediction cases - 1
Init: Observed - 71 Predicted - 118 Correct - 50 70%
Intr: Observed - 557 Predicted - 624 Correct - 489 88%
Term: Observed - 71 Predicted - 116 Correct - 51 72%
Sngl: Observed - 6 Predicted 6 Correct - 0
Genescan:
ALL EXONS: OBSERVED - 705 EXACTLY PREDICTED: 553 78%
(averaged over all exons)
Sne- 78.4 Spe- 66.1 Sn_n- 92.4 Sp_n 69.8 C- 0.80
no prediction cases - 1
Init: Observed - 71 Predicted - 93 Correct - 36 51%
Intr: Observed - 557 Predicted - 635 Correct - 469 84%
Term: Observed - 71 Predicted - 98 Correct - 48 68%
Sngl: Observed - 6 Predicted 11 Correct - 0
FGENES 1.6 Prediction of multiple genes in genomic DNA
Time: 18:22:57 Date: Sat Nov 21 1998
Seq name: > HSIFNAR 32906 bp DNA PRI 25-NO
Length of sequence: 32906 GC content: 0.41 Zone: 1
Number of predicted genes: 1 In +chain: 1 In -chain: 0
Number of predicted exons: 11 In +chain: 11 In -chain: 0
Positions of predicted genes and exons:
G Str Feature Start End Weight ORF-start ORF-end
1 + 1 CDSf 754 - 829 13.59 754 - 828
1 + 2 CDSi 11201 - 11324 4.41 11203 - 11322
1 + 3 CDSi 16768 - 16943 4.15 16769 - 16942
1 + 4 CDSi 19033 - 19187 3.66 19035 - 19187
1 + 5 CDSi 19300 - 19441 2.77 19300 - 19440
1 + 6 CDSi 21017 - 21131 3.76 21019 - 21129
1 + 7 CDSi 24861 - 25060 3.24 24862 - 25059
1 + 8 CDSi 25159 - 25313 2.63 25161 - 25313
1 + 9 CDSi 28528 - 28678 5.40 28528 - 28677
1 + 10 CDSi 29408 - 29553 3.02 29410 - 29553
1 + 11 CDSl 31085 - 31318 4.23 31085 - 31315
Predicted proteins:
>FGENES 1.5 > HSIFNAR 1 Multiexon gene 754 - 31318 557 a Ch+
MMVVLLGATTLVLVAVAPWVLSAAAGGKNLKSPQKVEVDIIDDNFILRWNRSDESVGNVT
FSFDYQKTGMDNWIKLSGCQNITSTKCNFSSLKLNVYEEIKLRIRAEKENTSSWYEVDSF
TPFRKAQIGPPEVHLEAEDKAIVIHISPGTKDSVMWALDGLSFTYSLLIWKNSSGVEERI
ENIYSRHKIYKLSPETTYCLKVKAALLTSWKIGVYSPVHCIKTTVENELPPPENIEVSVQ
NQNYVLKWDYTYANMTFQVQWLHAFLKRNPGNHLYKWKQIPDCENVKTTQCVFPQNVFQK
GIYLLRVQASDGNNTSFWSEEIKFDTEIQAFLLPPVFNIRSLSDSFHIYIGAPKQSGNTP
VIQDYPLIYEIIFWENTSNAERKIIEKKTDVTVPNLKPLTVYCVKARAHTMDEKLNKSSV
FSDAVCEKTKPGNTSKIWLIVGICIALFALPFVIYAAKVFLRCINYVFFPSLKPSSSIDE
YFSEQPLKNLLLSTSEEQIEKCFIIENISTIATVEETNQTDEDHKKYSSQTSQDSGNYSN
EDESESKTSEELQQDFV
--
Victor Solovyev
The Sanger Centre, Hinxton, Cambridge CB10 1SA, UK
Email: solovyev at sanger.ac.uk http://genomic.sanger.ac.uk
Phone: 44-1223-494799 FAX: 44-1223-494919
More information about the Bio-soft
mailing list
Send comments to us at biosci-help [At] net.bio.net