The Baylor College Of Medicine Computational Biology Group
Houston, TX
announces a new service
NNSSP
Prediction of protein secondary sturcture by combining
nearest-neighbor algorithms and multiple sequence alignments
(version 1. 10.5.94)
***************************************************************************
*********** NOTE ADDRESSES AND FORMATS HAVE CHANGED!! *********************
***************************************************************************
Analysis of protein primary sequences is available through the
University of Houston Gene-Server by sending the file containing a
sequence (a sequence name in the first string) to
service at bchs.uh.edu
with the subject line "nnssp".
Example: mail -s nnssp service at bchs.uh.edu < test.seq
where test.seq a file with the sequence.
Method description: **********************
Yi and Lander (*) developed a neural-network and nearest-neighbor
method with a scoring system that combined a sequence similarity
matrix with the local structural environment scoring scheme of Bowie
et al.(**) for predicting protein secondary structure. We have
improved their scoring system by taking into consideration N- and
C-terminal positions of a-helices and b-strands and also b-turns as
distinctive types of secondary structure. Another improvement, which
also significantly decrease the time of computation, is performed by
restricting a data base with a smaller subset of proteins which are
similar with a query sequence. Using multiple sequence alignments
rather than single sequences and a simple jury decision method we
achieved an over all three-state accuracy of 72.2%, which is better
than that observed for the most accurate multilayered neural network
approach, tested on the same data set of 126 non-homologous protein
chains.
(*) Yi T-M., Lander E.S. (1993)
Protein secondary structure prediction using nearest-neighbor methods.
J.Mol.Biol.,232:1117-1129.
(**) Bowie J.U., Luthy R., Eisenberg D. (1991)
A method to identify protein sequences that fold into a known
three-dimensional structure.
Science, 253, 164-170.)
Accuracy:
************************
Overall 3-states (a, b, c) prediction gives ~67.6% correctly predic-
ted residues on 126 non-homologous proteins using the jack-knife test
procedure.
Using multiple sequence alignments instead of single sequences increases
prediction accuracy up to 72.2%.
Submitting sequences via email:
***********************************
For email submission the sequences must have the following format:
a) if you send one sequence:
1 line - sequence name
2 line - number 1 in format I5
3 and subsequent lines - amino acid sequence
for example :
ADENYLATE KINASE
1
RLLRAIMGAPGSGKGTVSSRITKHFELKHLSSGDLLRDNMLRGTEIGVLA
KTFIDQGKLIPDDVMTRLVLHELKNLTQYNWLLDGFPRTLPQAEALDRAY
QIDTVINLNVPFEVIKQRLTARWIHPGSGRVYNIEFNPPKTMGIDDLTGE
PLVQREDDRPETVVK............
(Restrict the line length to 75 characters).
b) if you send multiple aligned sequences
1 line - sequence name
2 line - number of aligned sequences and length of protein
3 and subsequent lines - aligned sequences in format 60a1
for example:
ACTINOXANTHIN
5 107
10 20 30 40 50 60 (numbers not necessary)
APAFSVSPASGASDGQSVSVSVAAAGETYYIAQaAPVGGQDAaNPATATSFTTDASGAAS
APAFSVSPASGLSDGQSVSVSGAAAGETYYIAQCAPVGGQDACNPATATSFTTDASGAAS
APTATVTPSSGLSDGTVVKVAGAgaGTAYDVGQCAWVdgVLACNPADFSSVTADANGSAS
APGVTVTPATGLSNGQTVTVSATgpGTVYHVGQCAVvpGVIGCDATTSTDVTADAAGKIT
ATPKSSSGGAGASTGSGTSSAAVTSgaASSAQQSGLQGATGAGGGSSSTPGTQPGSGAGG
70 80 90 100
FSFTVRKSYAGQTPSGTPVGSVDbATDAbNLGAGNSGLNLGHVALTF
FSFV-RKSYAGZTPSGTPVGSVDCATDACNLGAGNSGLNLGHVALTF
TSLTVRRSFEGFLFDGTRWGTVDCTTAACQVGLSDAAGNGpgVAISF
AQLKVHSSFQAVvaNGTPWGTVNCKVVSCSAGLGSDSGEGAAQAITF
AIAARPVSAMGGtpPHTVPGSTNTTTTAMAGGVGGPgaNPNAAALM-
(you can use small letters for Cys aminoacids, if you want)
Alignment MUST be without deletions in the 1-st (query) sequence!!!
You could send the file containing the sequence to:
service at bchs.uh.edu
Subject line must be:
nnssp
Example: mail -s nnssp service at bchs.uh.edu < test.seq
Example of NNSSP output:
*****************************
ADENYLATE KINASE
10 20 30 40 50
Predic aaaaaaa bbb aaaaaaaa aa
a/acid RLLRAIMGAPGSGKGTVSSRITKHFELKHLSSGDLLRDNMLRGTEIGVLA
60 70 80 90 100
Predic aaaaaa aaaaaaaaaaaaaa aaaaaaaaaa
a/acid KTFIDQGKLIPDDVMTRLVLHELKNLTQYNWLLDGFPRTLPQAEALDRAY
110 120 130 140 150
Predic bbbb aaaaaaaa bb bbbbbb
a/acid QIDTVINLNVPFEVIKQRLTARWIHPGSGRVYNIEFNPPKTMGIDDLTGE
160 170 180 190 200
Predic aaaaaaaaaaa aaaaaaaaaa bb aaa
a/acid PLVQREDDRPETVVKRLKAYEAQTEPVLEYYRKKGVLETFSGTETNKIWP
210
Predic aaaaaaaa
a/acid HVYAFLQTKLPQRS
Reference:
Salamov A.A., Solovyev V.V. (1994)
Prediction of protein secondary sturcture by combining nearest-neighbor
algorithms and multiply sequence alignments.
Submitted to J.Mol.Biol.
Problems, comments, and suggestion:
Can be mailed to solovyev at cmb.bcm.tmc.edu.