The GeneX Gene Expression Database
National Center for Genome Resources
The GeneX team is happy (well, relieved anyway ;) ) to announce a public
release of NCGR's GeneX Gene Expression Database system.
The GeneX project is an Open Source endeavor to provide the gene expression
community a way of designing a system that best meets their needs. The
system can be downloaded from Sourceforge (http://genex.sourceforge.net) or
from NCGR's GeneX web site (http://genex.ncgr.org) and is licensed under the
We hope this provides a partial outlet for those labs that have experience
with expression analysis, some coding ability, and the desire to contribute
to project that can, with a little effort, add the feaures that they want
but doesn't require a build-from-scratch effort.
It provides a basic working system including:
- installation scripts
- a small amount of example data
- the Genex.pm Perl wrapper API to the database
- utilities to manipulate the XML transport format
- analytical tools which can be used with data from the database or with data
uploaded directly. They include apps for significance & permutation
testing as well as several kinds of clustering.
* CyberT - a significance testing tool which uses repeated t-tests (with
Bonferonni correction) and an optional Bayesian estimation of variance.
CyberT also uses xgobi (an XWindows app) to perform 3D
visualizations of the results, Principle Component Analysis, and linked
* Rcluster - an interface to the R cluster libs (several clustering
approaches, using several metrics)
* xcluster - Gavin Sherlock's speedy and memory-efficient clustering app
which also includes KMeans clustering and Self-Organizing Maps (we
provide the interface - you have to license the xcluster code directly
from Stanford: http://genome-www.stanford.edu/~sherlock/cluster.html)
- an interactive tool to load & annotate data (& a scriptable one is under
development to load multiple experiments in bulk, albeit with less
Its advantages are that it:
- is freely available in source code (tarball and anonymous CVS).
- has relatively small hardware requirements.
- requires no proprietary software to run.
- is relatively simple to install .. operative word 'relatively' ;) .
- supports multiple kinds of array data (Affy, Cy3,Cy5, radiolabeled blot).
- can incorporate commandline analytical routines very easily as CGIs.
- can share data via an XML for which there are free tools available.
- it has been developed using a number of R (aka GNU S) libs:
(http://cran.r-project.org/) and will continue to add more support for
this Open Source Software approach.
- can export data in a variety of formats for use with other tools.
* J-Express (http://www.ii.uib.no/~bjarted/jexpress) can directly import
one std format.
* with a local installation, you can export the data in xgobi format and
with minimal scripting in R, you could export in a number of other
formats as well. Use the source, Luke!
- it has a fairly active development community.
Its disadvantages (hey! it's free; there ARE disadvantages!) are:
- the user interface is crude.
- the query interface is crude and simple (but pretty easy to customize).
- we do not yet provide for easy normalization, altho such an interface is
under development (contributed by an external user) and more input would
be most welcome.
- it uses a heterogeneous (albeit standard) mix of software components.
- it requires some knowledge of Linux and Postgres (or whatever RDBMS in
which you want to implement it) to make it work. It is definitely *NOT*
Plug and Play.
- it is a relatively young project and therefore will probably not support
some critical operations.
- the current data loader is functional, but sub-optimal (and is being
re-written from scratch with the input of several labs).
- there are some known security issues (and certainly more unknown ones)
- its scalability is largely untested.
We're hoping that with enough interested, engaged users, each contributing
what they can (suggestions, bug descriptions, & especially code), useful
features can be suggested and implemented, bugs can be killed quickly, ports
to additional RDBMS can be completed, useful applications can be added, the
Data Model and XML feature set improved and contributed back to make the
MGED MAML XML as robust as it needs to be.
We welcome your feedback (really!) [genex at ncgr.org]
The GeneX Team
* Bill Beavis > William Anderson
* Greg Colello > Andrew Dalke
* Harry Mangalam > Carol Harger
* Lonny Montoya > Peter Hraber
* Michael Pear (honorary) > Karen Schlauch
* Todd Peterson > Mark Waugh
* Jason Stewart > Jennifer Weller
* Jiaye Zhou
* current > alumni (thanks guys!)