LuceGene document/object search system release 1.4

Don Gilbert gilbertd at bio.indiana.edu
Fri Feb 25 16:49:20 EST 2005

LuceGene release 1.4  is available now at
and http://eugenes.org/gmod/lucegene/

LuceGene is an open-source document/object search and retrieval system
specially tuned for bioinformatics text databases and documents.  It is
similar in concept to the commercial SRS package (Sequence Retrieval
System). LuceGene is written in Java, built with the open-source Lucene
package [http://jakarta.apache.org/lucene/]

This release includes an easy to use demonstration. Pop it into a Tomcat
web server and run.

LuceGene adds these bioinformatics methods to Lucene:

 * Indexing adaptors for formats such as XML, PDF Documents,
 Biosequences, Spreadsheets, HTML, and others, with fine tuning by data

 * Configurations for bio-data include UniProt/Swiss-Prot, Fasta and
 GenBank sequences, BIND protein interactions, BLAST outputs,
 Medline and others.

 * Support for batch-list look-ups and searches by ID, gene names, etc.

 * Web interface with paged results, batch downloads, search
 refinement and search-linking among data libraries.

 * Web Services support with a SOAP interface.

 * Output support for data-field selection and formats such as
 Spreadsheet, XML, HTML, and others.

It can take as little as a few hours engineering time to add new
databank parsing, making it a cost-effective way to use many
bioinformatics data sets.

LuceGene is speedy with big data sets: indexing and searching the
UniProt library of 1.7 million sequences with LuceGene is comparable to
using SRS. Gene Annotation object search and retrieval with LuceGene is
10x to 20x faster than using a Postgres Chado database.

-- Don Gilbert
Genome Informatics Lab
Indiana University, Bloomington IN
-- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
-- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net