In article <1993Jun30.165111.27075 at comp.bioz.unibas.ch> doelz at urz.unibas.ch writes:
>>I need a software which searches SWISS-PROT or similar
>protein sequence database with a proteolytic enzyme and
>a molecular weight.
>>E.g.,
>>give a molecular weight range (e.g., 32100-32500 )
>give an Enzyme (e.g. Trypsin)
>>desired result:
>>List of entries which contain trypsin fragments matching the molecular
>weight.
>>If anyone has/knows such software (preferrably, VMS or POSIX)
>please let me know.
>>Regards
>Reinhard
>====================================================
This is exactly what is available under our automatic server
under the name "MassSearch". You can use the server through
electronic mail, or you can get your own copy of Darwin (which
is the system which does the search).
For more information mail a message with
"help MassSearch" or just "help" to cbrg at inf.ethz.ch
======================================================
Next I transcribe some parts of the help file:
In some cases, recognition of proteins can be done by fragmenting
the protein according to certain pattern and using the molecular
weights of the fragments as a trace. This method is not effective
to find the composition of an unknown protein, but it is effective
in locating an unknown sample if its sequence is recorded in a
protein database.
One of the ways of breaking a protein into smaller pieces
according to a certain pattern is by using enzymes which digest
the protein. For example, trypsin breaks a protein after every
Arginine (R) or after every Lysine (K) not followed by a Proline
(P). AspN breaks a protein before every Aspartic acid (D). A table
of recognized enzymes and their cleavage rules is given below.
The molecular weight of fragments can be found experimentally by
mass spectrometry methods to a good level of accuracy. More
importantly, these methods typically require very small samples in
the order of fractions of pico-moles.
. . . .
This type of searching has been found particularly useful in the
following circumstances:
o To identify proteins when the amount available is very small,
for example as can be separated by 2D gels.
o To determine whether an unknown protein is already known in the
database before spending a significant effort in sequencing.
o To identify more than one protein which cannot be separated by
other means (this method has been successfully used to identify
two proteins which were digested together).
The template of the body of the message to be sent to
cbrg at inf.ethz.ch is (between but not including the dashed lines):
---------------------------------------------------------------------
MassSearch
Trypsin: 1524.0, 1509.7, 1387.5, 1169.4, 1014.4, 842.5,
836.4, 743.2, 717.2, 563.1, 511.3
---------------------------------------------------------------------
. . . . .
A complete description of the algorithm and the probability
foundations can be found in chapter 20 of "A tutorial introduction
to computational biochemistry using the Darwin system" by G.H.
Gonnet.
. . . . .