We installed New Version of gene-finding HMM based program FGENESH (GC)
for multiple gene prediction including GC-exons in genomic DNA at
http://genomic.sanger.ac.uk/
FGENESH (with possible Donor GC) - Prediction of multiple genes in genomic
DNA sequences
A NEW version of FGENESH program including NONCANONICAL GC dinucleotide in
donor splice sites.
This is the first program including in prediction the noncanonical exons.
Donor GC splice site is accounting for the major part of non-standard splice
sites in
human genes. It present about 0.6% of all splice sites and observed in more
than 5% of human
genes.
The noncanonical splice sites we investigated by us recently (Burset,
Seledtsov and Solovyev,
1999 in preparation) and we received about 20000 verified by EST splice
sites.
We received a very strong GC-donor site weight matrix which is used in gene
prediction program.
We have developed this variant of program to predict GC-donor exons in
addition to standard
exons and we preserve the accuracy of program on the standard genes. Testing
the
program on 68 human genes with at least one GC donor site shows that FGENESH
(GC)
provide 10% higher rate of exact exon prediction for such group and 5%
higheraccuracy on the
nucleotide livel.
Past your sequence to the first window or load your file with nucleotide
sequence in FASTA format
Past your protein sequence to the second window
References: Salamov A.A., Solovyev V.V. (1999), unpublished data.
Please reference: CGG WEB server:
http://genomic.sanger.ac.uk/
Fgenesh+ output:
G - the number of predicted gene (from sequence start)
Str - DNA strand (+ and - for complementary)
Feature - type of coding sequence (CDSf - First
(Starting with Start codon);
CDSi - internal (internal exon);
CDSl - the last coding seagment,
finishing by stop codon)
TSS - Position of transcription start (TATA-box position and score)
Start and End - Position of the Feature
Weight - Log likelihood*10 score for the feature
ORF-start/end - positions where the complete codons start and end
The last 3 values: Length of exon, positions in protein, % of
similarity w
n
FGENESH+ Prediction of potential genes in Human genomic DNA
Time: Mon Jul 26 21:38:41 1999
Seq name: Adh_and_cact.1 (2919020 bases) 848501 853000 Protein -
gi|23
4 Length 215 Sim: 90
Length of sequence: 4500 GC content: 40 Zone: 1
Number of predicted genes 1 in +chain 1 in -chain 0
Number of predicted exons 4 in +chain 4 in -chain 0
Positions of predicted genes and exons:
G Str Feature Start End Score ORF Len
1 + 1 CDSi 2577 - 2690 197.66 2579 - 2689 111
1 + 2 CDSi 2756 - 2936 312.35 2758 - 2934 177
1 + 3 CDSi 2991 - 3173 307.82 2992 - 3171 180
1 + 4 CDSl 3242 - 3419 301.90 3243 - 3419 177
Predicted protein(s):
>FGENESH 1 4 exon (s) 2577 - 3419 217 aa, chain +
PNMTAAPYNYNYIFKYIIIGDMGVGKSCLLHQFTEKKFMANCPHTIGVEFGTRIIEVDDK
KIKLQIWDTAGQERFRAVTRSYYRGAAGALMVYDITRRSTYNHLSSWLTDTRNLTNPSTV
IFLIGNKSDLESTREVTYEEAKEFADENGLMFLEASAMTGQNVEEAFLETARKIYQNIQE
GRLDLNASESGVQHRPSQPSRTSLSSEATGAKDQCSC
---