Mumps (the language), hierarchical and multi-dimensional data bases

Kevin O'Kane okane at cs.uni.edu
Mon Jun 24 04:28:45 EST 2002

The Mumps language originated in the mid-60's at the Massachusetts General Hospital. 
The acronym stands for "Massachusetts General Hospital Utility Multi-Programming System". 
While it has been used in a number of areas but its primary application is to medicine.
While the number of proprietary implementations has consolidated into the hands of a small 
number of companies, we have developed an open source version of the language which is 
distributed freely under the GNU GPL and LGPL licenses.

Mumps is potentially attractive for bioinformatics applications because:

- It supports an hierarchical data base facility. Mumps data sets are not only organized 
  along traditional sequential and direct access methods, but also as hierarchical trees 
  whose data nodes are addressed as path descriptions in a manner which is easy for a programmer 
  to master in a relatively short time.  

- The data base can also be viewed as string-indexed, multidimensional matrices of effectively 
  unlimited size.  

- The underlying data base processor, the Berkeley DB, can be configured for data bases up to 
  256 terabytes in size.

- It has flexible and powerful string manipulation facilities. Mumps built-in string manipulation 
  operators and functions, which include the Perl Compatible Regular Expression Library, permit
  complex string manipulation and pattern matching operations.

- This version of Mumps, unlike all others, is a compiler that translates Mumps code to C.
  Mumps subroutines can be constructed which can be called by any other program that obeys
  the C calling conventions.  Similarly, Mumps programs and subroutines can call any other
  system facility that uses a C calling structure.  This feature is unique to this version of

- The data base can operate in standalone or client-server mode.  In standalone mode, multiple
  programs can simultaneously access the same data base files.  In client-server mode, Mumps
  client routines can access local or remote Mumps data bases through TCP/IP or UDP connections.
  TCP/IP connections have the option of using OpenSSL encryption.  These are compile time switch
  options and require no specific program modifications to use.

- Mumps programs can be used with the Gtk based Glade "drag and drop" GUI builder.  This permits
  rapid deployment of user friendly GUI interfaces (see references below for examples).

- Mumps routines can be used to easily construct CGI scripts for data base access.  Mumps programs
  can be called directly by the web server and have builtin facilities to parse the QUERY_STRING
  environment variable to instantiated program variables and data (see references).

- There are builtin commands to access PostgreSQL RDBMS data bases (can be modified for MySQL).

We have done some initial testing of using Mumps in connection with the NCBI BLAST
software.  It the test, we moved data directly from the "doblast" example output
routines to a Mumps data base. (ftp.ncbi.nih.gov/blast/demo) without problems. As
a result, there appear to be no compatibility issues.  An example is given in the
references below.

We would be very interested in any suggestions regarding how we might extend this work to
make it more useful for bioinformatic applications.

All the software is open source and GUN GPL/LGPL. The main web page for this work, which 
includes coding examples, manuals and so forth, is:


The direct link to the documentation is:


The link to the BLAST example is:


The source code is at:


The main development vehicle is Linux.

Kevin C. O'Kane
Department of Computer Science 
University of Northern Iowa
Cedar Falls, IA 50614-0507
(319) 273 7322 (Office + Voice Mail)
(319) 266 4131 (Iowa)
(508) 778 9485 (Massachusetts)
okane at cs.uni.edu


More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net