Molecular Weight search software wanted

Gaston Gonnet gonnet at inf.ethz.ch
Wed Jun 30 14:20:20 EST 1993

In article <1993Jun30.165111.27075 at comp.bioz.unibas.ch> doelz at urz.unibas.ch writes:
>I need a software which searches SWISS-PROT or similar 
>protein sequence database with a proteolytic enzyme and 
>a molecular weight. 
>give a molecular weight range (e.g., 32100-32500 )
>give an Enzyme (e.g. Trypsin)
>desired result: 
>List of entries which contain trypsin fragments matching the molecular
>If anyone has/knows such software (preferrably, VMS or POSIX) 
>please let me know. 

This is exactly what is available under our automatic server
under the name "MassSearch".  You can use the server through
electronic mail, or you can get your own copy of Darwin (which
is the system which does the search).

For more information mail a message with

"help MassSearch"  or just "help" to   cbrg at inf.ethz.ch

Next I transcribe some parts of the help file:

In  some cases, recognition of proteins can be done by fragmenting
the  protein  according to certain pattern and using the molecular
weights of the fragments as a trace.  This method is not effective
to find the composition of an unknown protein, but it is effective
in  locating  an  unknown  sample if its sequence is recorded in a
protein database.

One  of  the  ways  of  breaking  a  protein  into  smaller pieces
according  to  a  certain pattern is by using enzymes which digest
the  protein.  For  example,  trypsin breaks a protein after every
Arginine  (R)  or after every Lysine (K) not followed by a Proline
(P). AspN breaks a protein before every Aspartic acid (D). A table
of recognized enzymes and their cleavage rules is given below.

The  molecular  weight of fragments can be found experimentally by
mass  spectrometry  methods  to  a  good  level of accuracy.  More
importantly, these methods typically require very small samples in
the order of fractions of pico-moles.

. . . .

This  type  of searching has been found particularly useful in the
following circumstances:

o  To  identify  proteins when the amount available is very small,
   for example as can be separated by 2D gels.
o  To determine whether an unknown protein is already known in the
   database before spending a significant effort in sequencing.
o  To  identify more than one protein which cannot be separated by
   other means (this method has been successfully used to identify
   two proteins which were digested together).

The   template   of  the  body  of  the  message  to  be  sent  to
cbrg at inf.ethz.ch is (between but not including the dashed lines):

Trypsin: 1524.0, 1509.7, 1387.5, 1169.4, 1014.4, 842.5,
          836.4,  743.2,  717.2,  563.1,  511.3

. . . . .

A  complete  description  of  the  algorithm  and  the probability
foundations can be found in chapter 20 of "A tutorial introduction
to  computational  biochemistry  using  the Darwin system" by G.H.

. . . . . 

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net