IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

GeneX: an Open Source Gene Expression Database

Harry Mangalam mangalam at home.com
Fri Mar 23 20:00:01 EST 2001

                 The GeneX Gene Expression Database 
                             from the 
                National Center for Genome Resources 

The GeneX team is happy (well, relieved anyway ;) ) to announce a public
release of NCGR's GeneX Gene Expression Database system.

The GeneX project is an Open Source endeavor to provide the gene expression
community a way of designing a system that best meets their needs.  The
system can be downloaded from Sourceforge (http://genex.sourceforge.net) or
from NCGR's GeneX web site (http://genex.ncgr.org) and is licensed under the

We hope this provides a partial outlet for those labs that have experience
with expression analysis, some coding ability, and the desire to contribute
to project that can, with a little effort, add the feaures that they want 
but doesn't require a build-from-scratch effort.

It provides a basic working system including:

- installation scripts
- a small amount of example data
- the Genex.pm Perl wrapper API to the database
- utilities to manipulate the XML transport format
- analytical tools which can be used with data from the database or with data
  uploaded directly.  They include apps for significance & permutation 
  testing as well as several kinds of clustering.
  * CyberT - a significance testing tool which uses repeated t-tests (with
    Bonferonni correction) and an optional Bayesian estimation of variance.
    CyberT also uses xgobi (an XWindows app) to perform 3D
    visualizations of the results, Principle Component Analysis, and linked
  * Rcluster - an interface to the R cluster libs (several clustering
    approaches, using several metrics)
  * xcluster - Gavin Sherlock's speedy and memory-efficient clustering app
    which also includes KMeans clustering and Self-Organizing Maps (we
    provide the interface - you have to license the xcluster code directly 
    from Stanford:  http://genome-www.stanford.edu/~sherlock/cluster.html)
- an interactive tool to load & annotate data (& a scriptable one is under
    development to load multiple experiments in bulk, albeit with less

Its advantages are that it:
- is freely available in source code (tarball and anonymous CVS).
- has relatively small hardware requirements.
- requires no proprietary software to run.
- is relatively simple to install .. operative word 'relatively' ;) .
- supports multiple kinds of array data (Affy, Cy3,Cy5, radiolabeled blot).
- can incorporate commandline analytical routines very easily as CGIs.
- can share data via an XML for which there are free tools available.
- it has been developed using a number of R (aka GNU S) libs:
    (http://cran.r-project.org/) and will continue to add more support for 
    this Open Source Software approach.
- can export data in a variety of formats for use with other tools.
  * J-Express (http://www.ii.uib.no/~bjarted/jexpress) can directly import
    one std format.
  * with a local installation, you can export the data in xgobi format and
    with minimal scripting in R, you could export in a number of other 
    formats as well.  Use the source, Luke!
- it has a fairly active development community. 

Its disadvantages (hey! it's free; there ARE disadvantages!) are:
- the user interface is crude.
- the query interface is crude and simple (but pretty easy to customize).
- we do not yet provide for easy normalization, altho such an interface is
    under development (contributed by an external user) and more input would
    be most welcome.
- it uses a heterogeneous (albeit standard) mix of software components.
- it requires some knowledge of Linux and Postgres (or whatever RDBMS in
    which you want to implement it) to make it work.  It is definitely *NOT* 
    Plug and Play.
- it is a relatively young project and therefore will probably not support 
    some critical operations.
- the current data loader is functional, but sub-optimal (and is being
    re-written from scratch with the input of several labs).
- there are some known security issues (and certainly more unknown ones)
- its scalability is largely untested.

We're hoping that with enough interested, engaged users, each contributing
what they can (suggestions, bug descriptions, & especially code), useful
features can be suggested and implemented, bugs can be killed quickly, ports
to additional RDBMS can be completed, useful applications can be added, the
Data Model and XML feature set improved and contributed back to make the
MGED MAML XML as robust as it needs to be. 

We welcome your feedback (really!) [genex at ncgr.org]

The GeneX Team

  * Bill Beavis                        > William Anderson
  * Greg Colello                       > Andrew Dalke
  * Harry Mangalam                     > Carol Harger
  * Lonny Montoya                      > Peter Hraber
  * Michael Pear (honorary)            > Karen Schlauch
  * Todd Peterson                      > Mark Waugh
  * Jason Stewart                      > Jennifer Weller
  * Jiaye Zhou

  * current                            > alumni (thanks guys!)

More information about the Bionews mailing list

Send comments to us at biosci-help [At] net.bio.net