IUBio

ANNOUNCEMENT OF O-GLYCOSYLATION PREDICTION SERVER

Jan Hansen janhan
Tue Jul 4 02:23:15 EST 1995


*************** NetOglyc Mail Server V1.0 ***************

Prediction of Mucin type O-glycosylation of mammalian proteins

Center for Biological Sequence Analysis The Technical University of Denmark
DK-2800 Lyngby, Denmark

DESCRIPTION:

The NetOglyc mail server is a service producing neural network predictions of
mucin type O-glycosylation sites in mammalian proteins as described in: J.E.
Hansen, O. Lund, J. Engelbrecht, H. Bohr, J.O. Nielsen, J-E.S. Hansen and S.
Brunak, Prediction of O-glycosylation of mammalian proteins: Specificity
patterns of UDP-GalNAc:polypeptide N-acetylgalac- tosaminyltransferase.
The Biochemical Journal, 308, 801-813, 1995.

ABSTRACT:

The specificity of the enzyme(s) catalyzing the covalent link between the
hydroxyl side-chains of serine or threonine and the sugar moiety GalNAc is
unknown. Pattern recognition by artificial neural networks and weight matrix
algorithms was performed to determine the exact position of in vivo O-linked
GalNAc glycosylated serine and threonine residues from the primary sequence
exclusively. The acceptor sequence context for O-glycosylation of serine was
found to differ from that of threonine and the two types were therefore treated
separately. The context of the sites showed a high abundance of proline, serine
and threonine extending far beyond the previously reported region covering
positions -4 through +4 relative to the glycosylated residue. The
O-glycosylation sites were found to cluster and to have a high abundance in the
amino-terminal part of the protein. The sites were also found to have an
increased preference for three different classes of beta-turns. No simple
consensus like rule could be deduced for the complex glycosylation sequence
acceptor patterns. The neural networks were trained on the hitherto largest
data
material consisting of 48 carefully examined mammalian glycoproteins comprising
264 O-glycosylation sites. For detection neural network algorithms were much
more reliable than weight matrices. The networks correctly found 60-95% of the
O-glycosylated serine/threonine residues and 89-97% of the non-glycosylated
residues in two independent test sets of known glycoproteins. A computer server
using E-mail for prediction of O-glycosylation sites has been implemented and
made publicly available.

FURTHER INFORMATION:

The NetOglyc server returns a help file if the submitted file contains the word
`help'.

CONFIDENTIALITY

Your submitted sequences will be deleted automatically immediately after
processing by NetOglyc.

PAPER TO REFERENCE IN REPORTING RESULTS:

Jan E. Hansen, Ole Lund, Jacob Engelbrecht, Henrik Bohr, Jens O.  Nielsen,
John-E.S. Hansen, and Soren Brunak. Prediction of O-glycosylation of mammalian
proteins: Specificity patterns of UDP-GalNAc:polypeptide
N-acetylgalactosaminyltransferase. Biochemical Journal 308, 801-813, 1995.

COMMENTS AND SUGGESTIONS:

Since an expanded data set with additional O-glycosylated sequences would
increase the performance of the network, we are very interested in receiving
such material. If you have knowledge of experimentally determined
O-glycosylation sites in glycoproteins not already in the data set (see
reference Biochem. J. 308, 801-813, 1995.) we would like to include them. Any
other comments regarding the predictions or the data may be sent to:

 Jan Hansen (janhan at cbs.dtu.dk)

 Center for Biological Sequence Analysis The Technical University of Denmark
 Building 206 DK-2800 Lyngby Denmark

 Tel: +45 45252485 Fax: +45 45934808

PROBLEMS:

Should be addressed to:

 Kristoffer Rapacki (rapacki at cbs.dtu.dk)

 or

 Karsten Dalsgaard (karsten at cbs.dtu.dk)

 Center for Biological Sequence Analysis The Technical University of Denmark
 Building 206 DK-2800 Lyngby Denmark

 Tel: +45 45252477 Fax: +45 45934808

-----------------------------------------------------------------------

INSTRUCTIONS for using the NetOglyc mail server:

In order to use the mail server for prediction on amino acid sequences:

1) Prepare a text file including one or more sequences. The sequences must be
preceded by a first line starting by the symbol > followed by a name
(identifier) of the sequence.  Next line contain the sequence. There must be at
least one character at each line of each sequence. Note: Any character after
the
symbol > will be interpreted as sequence.

The sequences must be submitted using the one letter abbreviations for the
amino
acids: `acdefghiklmnpqrstvwyACDEFGHIKLMNPQRSTVWY'.  N.B. Other characters will
be accepted, but not encoded in the network window, when making the prediction.

Example: Create a text file: `sequence.txt' using an editor, the syntax of the
file may look like this:

>seq_name1 
ACDYTCGSNCYSSSDVSTAQAAGYKLHEDGETVGSNSYPHKYNNYEGFDFSVSSPYYEWPILSSGDVY
GETVGSNSYPHKYNNYEGFDFSVSSPYYEWPILSSGDVYSGGSPGADRVVFNENNQLAGVITHTGASG
NNFVECT
>seq_name2 
TELKAVAHQPTGYTMVPFRVDPPNEVTVEDKDRMTLEKVVFESHKCVVLGSHIVHAKMEVGDLAATKG
GHAWAMGFAETIPMYFEIAYAETPKSANAAVIYPKGD

2) Mail the text file to NetOglyc at cbs.dtu.dk:

In the UNIX environment you may mail the text file `sequence.txt' to
NetOglyc at genome.cbs.dtu.dk by typing:

mail NetOglyc at .cbs.dtu.dk < sequence.txt

3) You will receive a mail containing the prediction, or possibly error
messages
from the server. If the file contains the word `help', this help file will be
returned. Response time depends on system load.

4) A www server: http://www.cbs.dtu.dk/netOglyc/cbsnetOglyc.html may also be
used.

--------------------------------------------------------------------------------

FORMAT OF NetOglyc PREDICTION OUTPUT:

IDENTIFIER:    	<sequence name> 
LENGTH:     	<length of sequence in amino acids>
DISTRIBUTION:   <number of predicted O-glycosylations>

SSTTGVAMHTSTSSSVTKSYISSQT <sequence>
s........s.s.....s..s... <Predicted O-glycosylated assignment (serine)>



SINGLE RESIDUE ACTIVITIES:

ID   	   <sequence name>
POSITION   <position in sequence of serines or threonines>
RESIDUE    <amino acid> 
ASSIGNMENT <predicted assignment: s or t=O-glycosylated, .=non-glycosylated> 
ACTIVITY   <prediction strength, values above threshold of 0.5 means
O-glycosylated serine or threonine>

--------------------------------------------------------------------------------

EXAMPLE OF OUTPUT OF PREDICTION OF seq_name1 mentioned above.

NetOglyc Mail Server Output Prediction for: THREONINE RESIDUES

Message 1/1 From NetOglyc mail server     Jun 26 '95 at 12:27 pm 120

IDENTIFIER: seq_name1 LENGTH:  143 DISTRIBUTION: t:  1
 
ACDYTCGSNCYSSSDVSTAQAAGYKLHEDGETVGSNSYPHKYNNYEGFDFSVSSPYYEWPILSSGDVYGETVGSNSYPHK
YNNYEGFDFSVSSPYYEWPILSSGDVYSGGSPGADRVVFNENNQLAGVITHTGASGNNFVECT
................t..............................................................
..............................................................

 SINGLE RESIDUE ACTIVITIES:  (ID, POSITION, RESIDUE, ASSIGNMENT, ACTIVITY)

 seq_name1    5 T . 0.150 
 seq_name1   18 T t 0.522 
 seq_name1   32 T . 0.283
 seq_name1   71 T . 0.376 
 seq_name1  130 T . 0.188 
 seq_name1  132 T . 0.312
 seq_name1  143 T . 0.157

 NetOglyc Mail Server Output Prediction for: SERINE RESIDUES

 IDENTIFIER: seq_name1 LENGTH:  143 DISTRIBUTION: s:  1

ACDYTCGSNCYSSSDVSTAQAAGYKLHEDGETVGSNSYPHKYNNYEGFDFSVSSPYYEWPILSSGDVYGETVGSNSYPHK
YNNYEGFDFSVSSPYYEWPILSSGDVYSGGSPGADRVVFNENNQLAGVITHTGASGNNFVECT
...............................................................................
....................s.........................................

 SINGLE RESIDUE ACTIVITIES:  (ID, POSITION, RESIDUE, ASSIGNMENT, ACTIVITY)

 seq_name1   8 S . 0.243
 seq_name1  12 S . 0.181 
 seq_name1  13 S . 0.290
 seq_name1  14 S . 0.404 
 seq_name1  17 S . 0.043 
 seq_name1  35 S . 0.186
 seq_name1  37 S . 0.227 
 seq_name1  51 S . 0.089 
 seq_name1  53 S . 0.087
 seq_name1  54 S . 0.046 
 seq_name1  63 S . 0.390 
 seq_name1  64 S . 0.075
 seq_name1  74 S . 0.077 
 seq_name1  76 S . 0.203 
 seq_name1  90 S . 0.089
 seq_name1  92 S . 0.087 
 seq_name1  93 S . 0.046 
 seq_name1  102 S s 0.618
 seq_name1  103 S . 0.177 
 seq_name1  108 S . 0.202 
 seq_name1  111 S . 0.197
 seq_name1  135 S . 0.120

--------------------------------------------------------------------------------

CURRENT NETWORK

The network will be updated and predictions can alter due to different
versions.
The network is balanced to give optimal predictions whether you submit
sequences
with no homology to the known O-glycosylated proteins or not. If however the
submitted sequence is identical to the sequences in our training dataset, 
we will notify you by sending you both the assigment of the identical
sequence in our data set and the prediction.





More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net