from : the Belgian EMBnet Node
Yes, there is a way of predicting signal peptide cleavage sites.
You need the program sigcleave from the egcg package. egcg is
provided on the CD-ROM from GCG, but it is not installed automatically,
you must do this separately.
In annex I send you the introductory text we provide to our users.
Hopefully, this will help your user.
Greetings,
Dr. Guy Bottu
Annex :
Proteins destined to be exported generally start with a signal peptide and
are made on ribosomes that attach to the ER in eukaryotes and to the
cell membrane in prokaryotes, after what the protein passes through the
membrane and the signal peptide is cleaved of. The signal peptide consists
of a basic N-terminal region, a hydrophobic region and a more
hydrophylic cleavage region.
To detect potential signal peptides in proteins or translated genes,
there is the egcg program sigcleave. The program takes as input a
protein sequence, computes a score for a sliding window of 15 amino
acids wide and gives as output a file (default extension .sig) with
the position that yields the highest score and eventually the other
positions that yield a score higher than the Minweight. The cleavage
site is indicated by a "-". By default the program scans for eukaryotic
signal peptides, unless you add the parameter -prokaryote to the command
line.
The actual scoring tables are computed with the help of the frequency
tables in sigweighteuk.dat and sigweightprok.dat. All the 0 values are
first replaced by 1 except those in positions -1 and -3 which are
replaced by 10^(-10) ; then all values are divided by the Expected number
of amino acids for an average protein and the natural logarithm is
taken.
The signal peptide found by sigcleave corresponds to the c-region
preceded by the terminal part of the h-region. The program sometimes
proposes a number of cleavage sites that lie in front of the true
cleavage site, but have a lower score.
In order to distinguish significant finds from spurious ones, you can
use the rules of McGeoch : following the starting (N-formyl-)methionine
comes a n-region of at most 11 amino acids with a net charge between -1
and +2 and a h-region of 7 to 20 amino acids with a total amino acid
hydrophilicity according to Kyte and Doolittle of -15 or less. You can
obtain the KD hydrophilicity of the individual amino acids by running
the program peptidestructure with parameter -hwindow=1.