*************** NetOglyc Mail Server V1.0 ***************
Prediction of Mucin type O-glycosylation of mammalian proteins
Center for Biological Sequence Analysis
The Technical University of Denmark
DK-2800 Lyngby, Denmark
DESCRIPTION:
The NetOglyc mail server is a service producing neural network
predictions of mucin type O-glycosylation sites in mammalian proteins as
described in:
J.E. Hansen, O. Lund, J. Engelbrecht, H. Bohr, J.O. Nielsen, J.E.S.
Hansen and S. Brunak, Prediction of O-glycosylation of mammalian
proteins: Specificity patterns of UDP-GalNAc:polypeptide
N-acetylgalactosaminyltransferase.
The Biochemical Journal, 308, 801-813, 1995.
ABSTRACT:
The specificity of the enzyme(s) catalyzing the covalent link between
the hydroxyl side-chains of serine or threonine and the sugar moiety
GalNAc is unknown. Pattern recognition by artificial neural networks
and weight matrix algorithms was performed to determine the exact
position of in vivo O-linked GalNAc glycosylated serine and threonine
residues from the primary sequence exclusively. The acceptor sequence
context for O-glycosylation of serine was found to differ from that of
threonine and the two types were therefore treated separately. The
context of the sites showed a high abundance of proline, serine and
threonine extending far beyond the previously reported region covering
positions -4 through +4 relative to the glycosylated residue. The
O-glycosylation sites were found to cluster and to have a high
abundance in the amino-terminal part of the protein. The sites were
also found to have an increased preference for three different classes
of beta-turns. No simple consensus like rule could be deduced for the
complex glycosylation sequence acceptor patterns. The neural networks
were trained on the hitherto largest data material consisting of 48
carefully examined mammalian glycoproteins comprising 264
O-glycosylation sites. For detection neural network algorithms were
much more reliable than weight matrices. The networks correctly found
60-95% of the O-glycosylated serine/threonine residues and 89-97% of
the non-glycosylated residues in two independent test sets of known
glycoproteins. A computer server using E-mail for prediction of
O-glycosylation sites has been implemented and made publicly available.
FURTHER INFORMATION:
The NetOglyc server returns a help file if the submitted file contains
the word `help'.
CONFIDENTIALITY
Your submitted sequences will be deleted automatically immediately
after processing by NetOglyc.
PAPER TO REFERENCE IN REPORTING RESULTS:
Jan E. Hansen, Ole Lund, Jacob Engelbrecht, Henrik Bohr, Jens O.
Nielsen, John-E.S. Hansen, and Soren Brunak. Prediction of
O-glycosylation of mammalian proteins: Specificity patterns of
UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferase. Biochemical
Journal 308, 801-813, 1995.
COMMENTS AND SUGGESTIONS:
Since an expanded data set with additional O-glycosylated sequences
would increase the performance of the network, we are very interested
in receiving such material. If you have knowledge of experimentally
determined O-glycosylation sites in glycoproteins not already in the
data set (see reference Biochem. J. 308, 801-813, 1995.) we would like
to include them. Any other comments regarding the predictions or the
data may be sent to:
Jan Hansen (janhan at cbs.dtu.dk)
Center for Biological Sequence Analysis
The Technical University of Denmark
Building 206
DK-2800 Lyngby
Denmark
Tel: +45 45252485
Fax: +45 45934808
PROBLEMS:
Should be addressed to:
Kristoffer Rapacki (rapacki at cbs.dtu.dk)
or
Karsten Dalsgaard (karsten at cbs.dtu.dk)
Center for Biological Sequence Analysis
The Technical University of Denmark
Building 206
DK-2800 Lyngby
Denmark
Tel: +45 45252477
Fax: +45 45934808
-----------------------------------------------------------------------
INSTRUCTIONS for using the NetOglyc mail server:
In order to use the mail server for prediction on amino acid sequences:
1) Prepare a text file including one or more sequences. The sequences
must be preceded by a first line starting by the symbol > followed by a name
(identifyer) of the sequence.
Next line contain the sequence. There must be at least one character at
each line of each sequence.
The sequences must be submitted using the one letter abbreviations for
the amino acids: `acdefghiklmnpqrstvwyACDEFGHIKLMNPQRSTVWY'.
N.B. Other characters will be accepted, but not encoded in the network
window, when making the prediction.
Example: Create a text file: `sequence.txt' using an editor, the syntax
of the file may look like this:
>seq_name1
ACDYTCGSNCYSSSDVSTAQAAGYKLHEDGETVGSNSYPHKYNNYEGFDFSVSSPYYEWPILSSGDVY
GETVGSNSYPHKYNNYEGFDFSVSSPYYEWPILSSGDVYSGGSPGADRVVFNENNQLAGVITHTGASG
NNFVECT
>seq_name2
TELKAVAHQPTGYTMVPFRVDPPNEVTVEDKDRMTLEKVVFESHKCVVLGSHIVHAKMEVGDLAATKG
GHAWAMGFAETIPMYFEIAYAETPKSANAAVIYPKGD
2) Mail the text file to NetOglyc at cbs.dtu.dk:
In the UNIX environment you may mail the text file `sequence.txt' to
NetOglyc at genome.cbs.dtu.dk by typing:
mail NetOglyc at .cbs.dtu.dk < sequence.txt
3) You will receive a mail containing the prediction, or possibly error
messages from the server. If the file contains the word `help', this
help file will be returned. Response time depends on system load.
4) A www server: http://www.cbs.dtu.dk/ may also be used.
FORMAT OF NetOglyc PREDICTION OUTPUT:
IDENTIFIER: <sequence name>
LENGTH: <length of sequence in amino acids>
DISTRIBUTION: <number of predicted O-glycosylations>
SSTTGVAMHTSTSSSVTKSYISSQT <sequence>
s........s.s.....s..s... <Predicted assignment (serine)>
SINGLE RESIDUE ACTIVITIES:
ID <sequence name>
POSITION <position in sequence of serines or threonines>
RESIDUE <amino acid>
ASSIGNMENT <predicted assignment: t=O-glycosylated, .=non-glycosylated>
ACTIVITY <prediction strength, values above the threshold above 0.5
denotes O-glycosylated serine or threonine>
EXAMPLE OF OUTPUT OF PREDICTION OF seq_name1 mentioned above.
NetOglyc Mail Server Output
Prediction for: THREONINE RESIDUES
Message 1/1 From genome mail server Jun 26 '95 at 12:27 pm
120
IDENTIFIER: seq_name1 LENGTH: 143
DISTRIBUTION: t: 1
ACDYTCGSNCYSSSDVSTAQAAGYKLHEDGETVGSNSYPHKYNNYEGFDFSVSSPYYEWPILSSGDVYGETVGSNSYPHK
YNNYEGFDFSVSSPYYEWPILSSGDVYSGGSPGADRVVFNENNQLAGVITHTGASGNNFVECT
................t..............................................................
..............................................................
SINGLE RESIDUE ACTIVITIES:
(ID, POSITION, RESIDUE, ASSIGNMENT, ACTIVITY)
seq_name1 5 T . 0.150
seq_name1 18 T t 0.522
seq_name1 32 T . 0.283
seq_name1 71 T . 0.376
seq_name1 130 T . 0.188
seq_name1 132 T . 0.312
seq_name1 143 T . 0.157
NetOglyc Mail Server Output
Prediction for: SERINE RESIDUES
IDENTIFIER: seq_name1 LENGTH: 143
DISTRIBUTION: s: 1
ACDYTCGSNCYSSSDVSTAQAAGYKLHEDGETVGSNSYPHKYNNYEGFDFSVSSPYYEWPILSSGDVYGETVGSNSYPHK
YNNYEGFDFSVSSPYYEWPILSSGDVYSGGSPGADRVVFNENNQLAGVITHTGASGNNFVECT
...............................................................................
....................s.........................................
SINGLE RESIDUE ACTIVITIES:
(ID, POSITION, RESIDUE, ASSIGNMENT, ACTIVITY)
seq_name1 8 S . 0.243
seq_name1 12 S . 0.181
seq_name1 13 S . 0.290
seq_name1 14 S . 0.404
seq_name1 17 S . 0.043
seq_name1 35 S . 0.186
seq_name1 37 S . 0.227
seq_name1 51 S . 0.089
seq_name1 53 S . 0.087
seq_name1 54 S . 0.046
seq_name1 63 S . 0.390
seq_name1 64 S . 0.075
seq_name1 74 S . 0.077
seq_name1 76 S . 0.203
seq_name1 90 S . 0.089
seq_name1 92 S . 0.087
seq_name1 93 S . 0.046
seq_name1 102 S s 0.618
seq_name1 103 S . 0.177
seq_name1 108 S . 0.202
seq_name1 111 S . 0.197
seq_name1 135 S . 0.120
CURRENT NETWORK
The network will be updated and predictions can alter due to different
versions. The network is
balanced to give optimal predictions whether you submit sequences with no
homology to the known
O-glycosylated proteins or not. If however the submitted sequence is very close
to or identical to the
sequences in our training dataset, we will notify you by sending you both the
assigment of the
homologous (or identical) sequence in our data set and the prediction.
Jan Hansen (janhan at cbs.dtu.dk)