PSITE - Search for of prosite patterns with statistical estimation
(Version 1) by Solovyev V.V.
Analysis of amino acid sequences is available through WWW:
http://dot.imgen.bcm.tmc.edu:9331/pssprediction/pssp.html
Method description:
The method is based on statistical estimation of expected number of
a prosite pattern in a given sequence. It uses the PROSITE database
(author: Amos Bairoch,1995) of functional motifs. If we found
a pattern which has expected number significantly less than 1,
it can be supposed that the analysed sequence possesses the
pattern function. Presented version 1 is the simplest
version that search for patterns without any deviation from a given
Prosite consensus. In the following version we will include this possibility.
In the output of PSITE we can see a prosite pattern, its position in the sequence,
accession number, ID, Description in the PROSITE database as well as
Document number where is pattern characteristics outlined.
It must be noted that patterns which started at the begining or end of protein
sequence will be recognized along the whole sequence in this version. It may
be useful for analysis of ORF or 6 frame translation sequences.
Asknowledgments: We asknowledge Ilgam Sahmuradov and Igor Rogozin which
took part in development some applications of this method for
nucleotide consensuses searching and Asya Salihova for
protein sites searching on IBM PC.
Submitting sequences via WWW:
Past your amino acid sequence to the WWW page window
RLLRAIMGAPGSGKGTVSSRITKHFELKHLSSGDLLRDNMLRGTEIGV
KTFIDQGKLIPDDVMTRLVLHELKNLTQYNWLLDGFPRTLPQAEALDRA
QIDTVINLNVPFEVIKQRLTARWIHPGSGRVYNIEFNPPKTMGIDDLTGE
PLVQREDDRPETVVK............
(Restrict the line length to 75 characters).
Example of PSITE output:
PSITE V1 - search for Prosite patterns
10 20 30 40 50 60
RLLRAIMGAPGSGKGTVSSRITKHFELKHLSSGDLLRDNMLRGTEIGVLAKTFIDQGKLI
70 80 90 100 110 120
PDDVMTRLVLHELKN*TQYNWLLDGFPRTLPQAEALDRAYQIDTVINLNVPFEVIKQRLT
130 140 150 160 170 180
ARWIHPGSGRVYNIEFNPPKTMGIDDLTGEPLVQREDDRPETVVKRLKAYEAQTEPVLEY
190 200 210 220 230 240
YRKKGVLETFSYTETNKIWPHVYAFLQTKLPDANKDDALDQREWSAAAAWLAAAAALDLN
250 260 270 280 290 300
AGCPAAALAAAAAGSAACAAAAAFAAAAAACCAACAAAAAAACAAAADAACGAYAYACAP
ID GLYCOSAMINOGLYCAN; RULE.
AC PS00002;
DE Glycosaminoglycan attachment site.
DO PDOC00002;
PA S-G-x-G.
Sites found: 1 Expected number: 0.0272 95% confidential interval: 0
# Start End Expected Site sequence
1 12 15 0.0272 SGKG
ID EF_HAND; PATTERN.
AC PS00018;
DE EF-hand calcium-binding domain.
DO PDOC00018;
PA D-x-[DNS]-{ILVFYW}-[DENSTG]-[DNQGHRK]-{GP}-[LIVMC]-[DENQSTAGC]-x(2)-
PA [DE]-[LIVMFYW].
Sites found: 1 Expected number: 0.0004 95% confidential interval: 0
# Start End Expected Site sequence
1 212 224 0.0004 DANKDDALDQREW
ID ADENYLATE_KINASE; PATTERN.
AC PS00113;
DE Adenylate kinase signature.
DO PDOC00104;
PA [LIVMFYW](3)-D-G-[FY]-P-R-x(3)-[NQ].
Sites found: 1 Expected number: 0.0000 95% confidential interval: 0
# Start End Expected Site sequence
1 81 92 0.0000 WLLDGFPRTLPQ
Reference:
Solovyev V.V., Kolchanov N.A. 1994,
Search for functional sites using consensus
In Computer analysis of Genetic macromolecules. (eds. Kolchanov N.A., Lim
H.A.), World Scientific, p.16-21.
=======================================================================================
The other programs in BCM Gene-Finder service:
========================================================================================
Analysis of uncharacterized human sequences is available through the
Weizmann Institute of Science
Gene-Server by sending the file containing a sequence (a sequence name is in the first line)
to services at bioinformatics.weizmann.ac.il with the subject line "fgenehb".
Examples: mail -s fgenehb services at bioinformatics.weizmann.ac.il < test.seq
where test.seq a file with the sequence.
You can use also WWW BCM Human Genome Center and Search launcher
Home page to get the help file URL:http://kiwi.imgen.bcm.tmc.edu:8088/search-launcher/launcher.html
for accsess to Gene-finder prediction Help files and programs. -> BCM Gene Finder
or directly: http://dot.imgen.bcm.tmc.edu:9331/gene-finder/gf.html
----------------------------------------------------------------------------------
Questions:solovyev at cmb.bcm.tmc.edu
================== The services are ===============================================================
FGEBEHB - search for Mammalian gene structure with exons assembling by dynamic programming and
using similarity information with known proteins by data base scaning with fasta
FEXHB - search for Mammalian coding exons using exon recognition functions and similarity information
with known proteins by data base scaning with fasta
(the above 2 programs are available by ftp to run locally,
the others can be used
through WWW and Email servers of Houston University and Weizmann Institute of Science):
mail -s fgeneh services at bioinformatics.weizmann.ac.il < test.seq
mail -s fexh service at theory.bchs.uh.edu < test.seq
FGENEH - search for Mammalian gene structure with exons assembling by dynamic
programming
FEXH - search for 5'-, internal and 3'-exons
HEXON - search for internal exons
HSPL - search for splice sites
RNASPL - prediction exon-exon junctions in cDNA sequences
CDSB - prediction of Bacterial coding regions
HBR - recognition of human and bacterial sequences to test a library
for E. coli contamination by sequencing example clones
TSSG - recognition of human promoter regions (Ghosh/Prestridge motif data)
TSSW - recognition of human promoter regions (Weingender motif data base)
POLYAH - recognition of of 3'-end cleavage and polyadenilation region
of human mRNA precursors
FGENED - search for Drosophila gene structure with exons assembling by dynamic
programming
FEXD - search for Drosophila 5'-, internal and 3'-exons
DSPL - search for Drosophila splice sites
FGENEN - search for Nematode gene structure with exons assembling by dynamic
programming
FEXN - search for Nematode 5'-, internal and 3'-exons
NSPL - search for Nematode splice sites
FGENEA - search for Plant gene structure with exons assembling by dynamic
programming
FEXA - search for Plant 5'-, internal and 3'-exons
ASPL - search for Plant splice sites
============================================================================
WWW address: http://dot.imgen.bcm.tmc.edu:9331/pssprediction/pssp.html
SSP - prediction of a-helix and b-strand in globular proteins
by segment-oriented approach.
NSSP - prediction of a-helix and b-strand segments in globular proteins
by nearest-neighbor algorithm.