The Baylor College Of Medicine Computational Biology Group
Houston, TX
announces a new service
HSPL
Email server for splice site prediction in Human sequences
***************************************************************************
*********** NOTE ADDRESSES AND FORMATS HAVE CHANGED!! *********************
***************************************************************************
Analysis of uncharacterized human sequences is available sending the
file containing a sequence name on the first line and a sequence (no
more than 80 chars/line) to
service at bchs.uh.edu
with the subject line "HSPL".
Example: mail -s HSPL service at bchs.uh.edu < test.seq
where test.seq a file with the sequence.
NOTE: This service is temporarily being provided through the
University of Houston Gene-Server. Only two jobs will be run at a
time.
Method description:
*******************
Using information about significant triplet frequencies in various
functional parts of splicing site regions, and preferences of
octanucleotides in protein coding and intron regions, a combined
linear discriminant recognition function was developed. The splice
site prediction scheme gives an accuracy of donor site recognition on
the test set 97% (correlation coefficient C=0.62) and 96% for acceptor
splice sites (C=0.48). The method is a good alternative to neural
network approach (Brunak et al.,Mol.Biol.,1991) that has C=0.61 with
95% accuracy of donor site prediction and C < 40 with 95% accuracy of
acceptor site prediction.
More precise splice site positions might be found if you will use
programs of exons recognition (HEXON, FEXH) and gene structure
prediction (FGENEH) from the server.
========================= HSSP citation ===============================
You should cite in your references one of the following
papers:
Solovyev V.V., Lawrence C.B. (1994) Prediction of Primate mRNA donor
and acceptor splice sites based on oligonucleotide composition. Mol.Biol.
(submitted).
or
Solovyev V.V., Salamov A.A., Lawrence C.B. 1994. The prediction of Human exons
by oligonucleotide composition and discriminant analysis of spliceable open
reading frames. In Proceedings of the Second International Conference on
Intelligent Systems for Molecular Biology (eds. Altman R., Brutlag D.,
Karp R., Latrop R. and Searls D.), AAAI Press, Menlo Park, CA (in press).
Solovyev V.V., Salamov A.A., Lawrence C.B. 1994.
Predicting internal exons by oligonucleotide composition and discriminant
analysis of spliaceable open reading frames. Nucleic Asids Res. (in press).
Current version of the program predicts only splice sites with GT and
AG conserved base pair for donor and acceptor splice sites,
respectively. They are usualy include more than 99% of all authentic
splice sites.
Further versions of the program will have options for the other
variants of conservative dinucleotides and extention for the other
species.
Input data:
***************
Following an example of data representation for the program:
1st string are 2 thresholds (donor and acceptor). You can use them or
decrease a little bit if you want to have more potential variants.
2nd string is the name of your sequence starting from space symbol.
3d string and the next are the sequence ( strings must be not more
than 80 letters).
-----------------------------------------------------------
76 65
HUMALPHA ds-DNA
cccgggctgtgtgcttccagcctcccctcctctcgacaccagaacagagcctggccccca
gctcccaggaaatacagaaaaaaaaaatggtggatgaacgagtgacagggtgtcttgttc
cacacaagacacagtgagcaggggttgggggaggggcccctggggcaggatgcacactgc
actatacccaaaatccccacccttccctggggacacctggtccaccctaagctgcctttc
---------------------------------------------------------------
The output of the program (enclosed below) includs: name, length and
positions and scores of the predicted splice sites. It must be
mentioned that there are some pseudosplice sites among them and the
higher the score of a site the more probably it is an authentic splice
site.
Questions, comments, and suggestions about the program, please, send
Email to solovyev at cmb.bcm.tmc.edu.
Program output:
HUMALPHA 4556 bp ds-DNA PRI 15-SEP-1
Length of sequence - 4556
Number of Donor sites: 11 Threshold: 0.76
1 329 0.76
2 517 0.87
3 728 0.88
4 955 0.98
5 1322 0.81
6 1954 0.85
7 1967 0.82
8 2126 0.84
9 2389 0.84
10 2662 0.79
11 2998 0.92
Number of Acceptor sites: 18 Threshold: 0.65
1 244 0.65
2 379 0.67
3 610 0.89
4 615 0.68
5 838 0.83
6 1146 0.75
7 1398 0.71
8 1818 0.78
9 1828 0.66
10 2052 0.88
11 2253 0.84
12 2469 0.81
13 2880 0.81
14 3119 0.80
15 3480 0.70
16 3989 0.69
17 4059 0.70
18 4273 0.71