*************** NetOglyc Mail Server V1.0 ***************
Prediction of Mucin type O-glycosylation of mammalian proteins
Center for Biological Sequence Analysis The Technical University of Denmark
DK-2800 Lyngby, Denmark
DESCRIPTION:
The NetOglyc mail server is a service producing neural network predictions of
mucin type O-glycosylation sites in mammalian proteins as described in:
J.E. Hansen, O. Lund, J. Engelbrecht, H. Bohr, J.O. Nielsen, J-E.S. Hansen
and S. Brunak.
Prediction of O-glycosylation of mammalian proteins: Specificity patterns of
UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferase.
The Biochemical Journal, 308, 801-813, 1995.
ABSTRACT:
The specificity of the enzyme(s) catalyzing the covalent link between the
hydroxyl side-chains of serine or threonine and the sugar moiety GalNAc is
unknown. Pattern recognition by artificial neural networks and weight matrix
algorithms was performed to determine the exact position of in vivo O-linked
GalNAc glycosylated serine and threonine residues from the primary sequence
exclusively. The acceptor sequence context for O-glycosylation of serine was
found to differ from that of threonine and the two types were therefore treated
separately. The context of the sites showed a high abundance of proline, serine
and threonine extending far beyond the previously reported region covering
positions -4 through +4 relative to the glycosylated residue. The
O-glycosylation sites were found to cluster and to have a high abundance in the
amino-terminal part of the protein. The sites were also found to have an
increased preference for three different classes of beta-turns. No simple
consensus like rule could be deduced for the complex glycosylation sequence
acceptor patterns. The neural networks were trained on the hitherto largest
data material consisting of 48 carefully examined mammalian glycoproteins
comprising 264 O-glycosylation sites. For detection neural network algorithms
were much more reliable than weight matrices. The networks correctly found
60-95% of the O-glycosylated serine/threonine residues and 89-97% of the
non-glycosylated residues in two independent test sets of known glycoproteins.
A computer server using E-mail for prediction of O-glycosylation sites has
been implemented and made publicly available.
FURTHER INFORMATION:
The NetOglyc server returns a help file if the submitted file contains the word
`help'.
CONFIDENTIALITY
Your submitted sequences will be deleted automatically immediately after
processing by NetOglyc.
PAPER TO REFERENCE IN REPORTING RESULTS:
Jan E. Hansen, Ole Lund, Jacob Engelbrecht, Henrik Bohr, Jens O. Nielsen,
John-E.S. Hansen, and Soren Brunak. Prediction of O-glycosylation of mammalian
proteins: Specificity patterns of UDP-GalNAc:polypeptide
N-acetylgalactosaminyltransferase. Biochemical Journal 308, 801-813, 1995.
COMMENTS AND SUGGESTIONS:
Since an expanded data set with additional O-glycosylated sequences would
increase the performance of the network, we are very interested in receiving
such material. If you have knowledge of experimentally determined
O-glycosylation sites in glycoproteins not already in the data set (see
reference Biochem. J. 308, 801-813, 1995.) we would like to include them. Any
other comments regarding the predictions or the data may be sent to:
Jan Hansen (janhan at cbs.dtu.dk)
Center for Biological Sequence Analysis The Technical University of Denmark
Building 206 DK-2800 Lyngby Denmark
Tel: +45 45252485 Fax: +45 45934808
PROBLEMS:
Should be addressed to:
Kristoffer Rapacki (rapacki at cbs.dtu.dk)
or
Karsten Dalsgaard (karsten at cbs.dtu.dk)
Center for Biological Sequence Analysis The Technical University of Denmark
Building 206 DK-2800 Lyngby Denmark
Tel: +45 45252477 Fax: +45 45934808
-----------------------------------------------------------------------
INSTRUCTIONS for using the NetOglyc mail server:
In order to use the mail server for prediction on amino acid sequences:
1) Prepare a text file including one or more sequences. The sequences must be
preceded by a first line starting by the symbol > followed by a name
(identifier) of the sequence. Next line contain the sequence. There must be at
least one character at each line of each sequence. Note: Any character after
the symbol > will be interpreted as sequence.
The sequences must be submitted using the one letter abbreviations for the
amino
acids: `acdefghiklmnpqrstvwyACDEFGHIKLMNPQRSTVWY'. N.B. Other characters will
be accepted, but not encoded in the network window, when making the prediction.
Example: Create a text file: `sequence.txt' using an editor, the syntax of the
file may look like this:
>seq_name1
ACDYTCGSNCYSSSDVSTAQAAGYKLHEDGETVGSNSYPHKYNNYEGFDFSVSSPYYEWPILSSGDVY
GETVGSNSYPHKYNNYEGFDFSVSSPYYEWPILSSGDVYSGGSPGADRVVFNENNQLAGVITHTGASG
NNFVECT
>seq_name2
TELKAVAHQPTGYTMVPFRVDPPNEVTVEDKDRMTLEKVVFESHKCVVLGSHIVHAKMEVGDLAATKG
GHAWAMGFAETIPMYFEIAYAETPKSANAAVIYPKGD
2) Mail the text file to NetOglyc at cbs.dtu.dk:
In the UNIX environment you may mail the text file `sequence.txt' to
NetOglyc at genome.cbs.dtu.dk by typing:
mail NetOglyc at .cbs.dtu.dk < sequence.txt
3) You will receive a mail containing the prediction, or possibly error
messages
from the server. If the file contains the word `help', this help file will be
returned. Response time depends on system load.
4) A www server: http://www.cbs.dtu.dk/netOglyc/cbsnetOglyc.html may also be
used.
FORMAT OF NetOglyc PREDICTION OUTPUT:
IDENTIFIER: <sequence name>
LENGTH: <length of sequence in amino acids>
DISTRIBUTION: <number of predicted O-glycosylations>
SSTTGVAMHTSTSSSVTKSYISSQT <sequence>
.s........s.s.....s..s... <Predicted O-glycosylated assignment (serine)>
SINGLE RESIDUE ACTIVITIES:
ID <sequence name>
POSITION <position in sequence of serines or threonines>
RESIDUE <amino acid>
ASSIGNMENT <predicted assignment: s or t=O-glycosylated, .=non-glycosylated>
ACTIVITY <prediction strength, values above threshold of 0.5 means
O-glycosylated
serine or threonine>
EXAMPLE OF OUTPUT OF PREDICTION OF seq_name1 mentioned above.
NetOglyc Mail Server Output Prediction for: THREONINE RESIDUES
Message 1/1 From NetOglyc mail server Jun 26 '95 at 12:27 pm 120
IDENTIFIER: seq_name1 LENGTH: 143 DISTRIBUTION: t: 1
ACDYTCGSNCYSSSDVSTAQAAGYKLHEDGETVGSNSYPHKYNNYEGFDFSVSSPYYEWPILSSGDVYGETVGSNSYPHK
YNNYEGFDFSVSSPYYEWPILSSGDVYSGGSPGADRVVFNENNQLAGVITHTGASGNNFVECT
.................t..............................................................
...............................................................
SINGLE RESIDUE ACTIVITIES: (ID, POSITION, RESIDUE, ASSIGNMENT, ACTIVITY)
seq_name1 5 T . 0.150
seq_name1 18 T t 0.522
seq_name1 32 T . 0.283
seq_name1 71 T . 0.376
seq_name1 130 T . 0.188
seq_name1 132 T . 0.312
seq_name1 143 T . 0.157
NetOglyc Mail Server Output Prediction for: SERINE RESIDUES
IDENTIFIER: seq_name1 LENGTH: 143 DISTRIBUTION: s: 1
ACDYTCGSNCYSSSDVSTAQAAGYKLHEDGETVGSNSYPHKYNNYEGFDFSVSSPYYEWPILSSGDVYGETVGSNSYPHK
YNNYEGFDFSVSSPYYEWPILSSGDVYSGGSPGADRVVFNENNQLAGVITHTGASGNNFVECT
................................................................................
.....................s.........................................
SINGLE RESIDUE ACTIVITIES: (ID, POSITION, RESIDUE, ASSIGNMENT, ACTIVITY)
seq_name1 8 S . 0.243
seq_name1 12 S . 0.181
seq_name1 13 S . 0.290
seq_name1 14 S . 0.404
seq_name1 17 S . 0.043
seq_name1 35 S . 0.186
seq_name1 37 S . 0.227
seq_name1 51 S . 0.089
seq_name1 53 S . 0.087
seq_name1 54 S . 0.046
seq_name1 63 S . 0.390
seq_name1 64 S . 0.075
seq_name1 74 S . 0.077
seq_name1 76 S . 0.203
seq_name1 90 S . 0.089
seq_name1 92 S . 0.087
seq_name1 93 S . 0.046
seq_name1 102 S s 0.618
seq_name1 103 S . 0.177
seq_name1 108 S . 0.202
seq_name1 111 S . 0.197
seq_name1 135 S . 0.120
CURRENT NETWORK
The network will be updated and predictions can alter due to different
versions.The network is balanced to give optimal predictions whether
you submit sequences with no homology to the known O-glycosylated
proteins or not. If however the submitted sequence is identical to
the sequences in our training dataset, we will notify you by sending
you both the assigment of the identical sequence in our data set
and the prediction.
*************** NetOglyc Mail Server V1.0 ***************
Prediction of Mucin type O-glycosylation of mammalian proteins
Center for Biological Sequence Analysis The Technical University of Denmark
DK-2800 Lyngby, Denmark
DESCRIPTION:
The NetOglyc mail server is a service producing neural network predictions of
mucin type O-glycosylation sites in mammalian proteins as described in:
J.E. Hansen, O. Lund, J. Engelbrecht, H. Bohr, J.O. Nielsen, J-E.S. Hansen
and S. Brunak.
Prediction of O-glycosylation of mammalian proteins: Specificity patterns of
UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferase.
The Biochemical Journal, 308, 801-813, 1995.
ABSTRACT:
The specificity of the enzyme(s) catalyzing the covalent link between the
hydroxyl side-chains of serine or threonine and the sugar moiety GalNAc is
unknown. Pattern recognition by artificial neural networks and weight matrix
algorithms was performed to determine the exact position of in vivo O-linked
GalNAc glycosylated serine and threonine residues from the primary sequence
exclusively. The acceptor sequence context for O-glycosylation of serine was
found to differ from that of threonine and the two types were therefore treated
separately. The context of the sites showed a high abundance of proline, serine
and threonine extending far beyond the previously reported region covering
positions -4 through +4 relative to the glycosylated residue. The
O-glycosylation sites were found to cluster and to have a high abundance in the
amino-terminal part of the protein. The sites were also found to have an
increased preference for three different classes of beta-turns. No simple
consensus like rule could be deduced for the complex glycosylation sequence
acceptor patterns. The neural networks were trained on the hitherto largest
data material consisting of 48 carefully examined mammalian glycoproteins
comprising 264 O-glycosylation sites. For detection neural network algorithms
were much more reliable than weight matrices. The networks correctly found
60-95% of the O-glycosylated serine/threonine residues and 89-97% of the
non-glycosylated residues in two independent test sets of known glycoproteins.
A computer server using E-mail for prediction of O-glycosylation sites has
been implemented and made publicly available.
FURTHER INFORMATION:
The NetOglyc server returns a help file if the submitted file contains the word
`help'.
CONFIDENTIALITY
Your submitted sequences will be deleted automatically immediately after
processing by NetOglyc.
PAPER TO REFERENCE IN REPORTING RESULTS:
Jan E. Hansen, Ole Lund, Jacob Engelbrecht, Henrik Bohr, Jens O. Nielsen,
John-E.S. Hansen, and Soren Brunak. Prediction of O-glycosylation of mammalian
proteins: Specificity patterns of UDP-GalNAc:polypeptide
N-acetylgalactosaminyltransferase. Biochemical Journal 308, 801-813, 1995.
COMMENTS AND SUGGESTIONS:
Since an expanded data set with additional O-glycosylated sequences would
increase the performance of the network, we are very interested in receiving
such material. If you have knowledge of experimentally determined
O-glycosylation sites in glycoproteins not already in the data set (see
reference Biochem. J. 308, 801-813, 1995.) we would like to include them. Any
other comments regarding the predictions or the data may be sent to:
Jan Hansen (janhan at cbs.dtu.dk)
Center for Biological Sequence Analysis The Technical University of Denmark
Building 206 DK-2800 Lyngby Denmark
Tel: +45 45252485 Fax: +45 45934808
PROBLEMS:
Should be addressed to:
Kristoffer Rapacki (rapacki at cbs.dtu.dk)
or
Karsten Dalsgaard (karsten at cbs.dtu.dk)
Center for Biological Sequence Analysis The Technical University of Denmark
Building 206 DK-2800 Lyngby Denmark
Tel: +45 45252477 Fax: +45 45934808
-----------------------------------------------------------------------
INSTRUCTIONS for using the NetOglyc mail server:
In order to use the mail server for prediction on amino acid sequences:
1) Prepare a text file including one or more sequences. The sequences must be
preceded by a first line starting by the symbol > followed by a name
(identifier) of the sequence. Next line contain the sequence. There must be at
least one character at each line of each sequence. Note: Any character after
the symbol > will be interpreted as sequence.
The sequences must be submitted using the one letter abbreviations for the
amino
acids: `acdefghiklmnpqrstvwyACDEFGHIKLMNPQRSTVWY'. N.B. Other characters will
be accepted, but not encoded in the network window, when making the prediction.
Example: Create a text file: `sequence.txt' using an editor, the syntax of the
file may look like this:
>seq_name1
ACDYTCGSNCYSSSDVSTAQAAGYKLHEDGETVGSNSYPHKYNNYEGFDFSVSSPYYEWPILSSGDVY
GETVGSNSYPHKYNNYEGFDFSVSSPYYEWPILSSGDVYSGGSPGADRVVFNENNQLAGVITHTGASG
NNFVECT
>seq_name2
TELKAVAHQPTGYTMVPFRVDPPNEVTVEDKDRMTLEKVVFESHKCVVLGSHIVHAKMEVGDLAATKG
GHAWAMGFAETIPMYFEIAYAETPKSANAAVIYPKGD
2) Mail the text file to NetOglyc at cbs.dtu.dk:
In the UNIX environment you may mail the text file `sequence.txt' to
NetOglyc at genome.cbs.dtu.dk by typing:
mail NetOglyc at .cbs.dtu.dk < sequence.txt
3) You will receive a mail containing the prediction, or possibly error
messages
from the server. If the file contains the word `help', this help file will be
returned. Response time depends on system load.
4) A www server: http://www.cbs.dtu.dk/netOglyc/cbsnetOglyc.html may also be
used.
FORMAT OF NetOglyc PREDICTION OUTPUT:
IDENTIFIER: <sequence name>
LENGTH: <length of sequence in amino acids>
DISTRIBUTION: <number of predicted O-glycosylations>
SSTTGVAMHTSTSSSVTKSYISSQT <sequence>
.s........s.s.....s..s... <Predicted O-glycosylated assignment (serine)>
SINGLE RESIDUE ACTIVITIES:
ID <sequence name>
POSITION <position in sequence of serines or threonines>
RESIDUE <amino acid>
ASSIGNMENT <predicted assignment: s or t=O-glycosylated, .=non-glycosylated>
ACTIVITY <prediction strength, values above threshold of 0.5 means
O-glycosylated
serine or threonine>
EXAMPLE OF OUTPUT OF PREDICTION OF seq_name1 mentioned above.
NetOglyc Mail Server Output Prediction for: THREONINE RESIDUES
Message 1/1 From NetOglyc mail server Jun 26 '95 at 12:27 pm 120
IDENTIFIER: seq_name1 LENGTH: 143 DISTRIBUTION: t: 1
ACDYTCGSNCYSSSDVSTAQAAGYKLHEDGETVGSNSYPHKYNNYEGFDFSVSSPYYEWPILSSGDVYGETVGSNSYPHK
YNNYEGFDFSVSSPYYEWPILSSGDVYSGGSPGADRVVFNENNQLAGVITHTGASGNNFVECT
.................t..............................................................
...............................................................
SINGLE RESIDUE ACTIVITIES: (ID, POSITION, RESIDUE, ASSIGNMENT, ACTIVITY)
seq_name1 5 T . 0.150
seq_name1 18 T t 0.522
seq_name1 32 T . 0.283
seq_name1 71 T . 0.376
seq_name1 130 T . 0.188
seq_name1 132 T . 0.312
seq_name1 143 T . 0.157
NetOglyc Mail Server Output Prediction for: SERINE RESIDUES
IDENTIFIER: seq_name1 LENGTH: 143 DISTRIBUTION: s: 1
ACDYTCGSNCYSSSDVSTAQAAGYKLHEDGETVGSNSYPHKYNNYEGFDFSVSSPYYEWPILSSGDVYGETVGSNSYPHK
YNNYEGFDFSVSSPYYEWPILSSGDVYSGGSPGADRVVFNENNQLAGVITHTGASGNNFVECT
................................................................................
.....................s.........................................
SINGLE RESIDUE ACTIVITIES: (ID, POSITION, RESIDUE, ASSIGNMENT, ACTIVITY)
seq_name1 8 S . 0.243
seq_name1 12 S . 0.181
seq_name1 13 S . 0.290
seq_name1 14 S . 0.404
seq_name1 17 S . 0.043
seq_name1 35 S . 0.186
seq_name1 37 S . 0.227
seq_name1 51 S . 0.089
seq_name1 53 S . 0.087
seq_name1 54 S . 0.046
seq_name1 63 S . 0.390
seq_name1 64 S . 0.075
seq_name1 74 S . 0.077
seq_name1 76 S . 0.203
seq_name1 90 S . 0.089
seq_name1 92 S . 0.087
seq_name1 93 S . 0.046
seq_name1 102 S s 0.618
seq_name1 103 S . 0.177
seq_name1 108 S . 0.202
seq_name1 111 S . 0.197
seq_name1 135 S . 0.120
CURRENT NETWORK
The network will be updated and predictions can alter due to different
versions.The network is balanced to give optimal predictions whether
you submit sequences with no homology to the known O-glycosylated
proteins or not. If however the submitted sequence is identical to
the sequences in our training dataset, we will notify you by sending
you both the assigment of the identical sequence in our data set
and the prediction.