genes, drugs and disease search engine tuning

Kasian Franks kfranks at lbl.gov kfranks at lbl.gov
Mon May 17 02:01:56 EST 2004

I've been building a system that I call GenoPharm at my lab. Its an
exploratory search engine that uses methods I've developed to extract
uniquely significant relationships between genes, drugs and diseases.

Methods used to relate one gene to another gene vary widely and range
from precise to general catch-all approaches. Utilizing common methods
one can relate two genes due to their sequence similarity, or a few
genes based on experimentally validated function similarity or
thousands of genes at once after automatically determining they share
the same annotation in biomedical text.

Some methods can be compared to visiting an aquarium to look at sea
creatures while others can be compared to taking a boat and scuba gear
out to sea to observe the ones your interested in. The optimum
approach in relating genes, diseases etc. in text would include
simulating the steps a human researcher would take in analyzing
biomedical literature to relate genes with one another, genes to a
disease, genes to a tissue or genes to a therapeutic and then
executing those steps as fast as a computer. To date, a system like
this has not been developed.

Starting with a gene, a researcher typically defines and controls the
context and individual relationships allowed to connect a gene with
another gene or disease during the process of reading and analyzing
literature. During this flexible process the context of interest may
change, become refined or shift and take on a new direction depending
on the value of the text. After this process is complete to the
satisfaction of the researcher, the researcher is left with a valuable
collection of related genes, diseases, etc that are all related to a
specific theme or context of interest. A system designed to simulate
this process would need to be flexible and interactive allowing the
researcher to define and control the context and individual
relationships during each step of the investigation process while at
the same time allowing for quality and quantity of relationships to be
determined and visualized interactively by the researcher.

By combining an interactive role, similar to what a researcher engages
in during the process of experimentation, and applying it to an
iterative process of automated text mining methods one can choose the
directions and define the relationships each step of the way as
connections are made between genes of interest. Interactively defining
and extracting relationships between genes and other genes, diseases,
tissues or therapeutics would provide a rare and valuable level of
precision for relationship exploration and discovery in biomedical

These systems allow for a list of genes or gene set to be submitted
and networked via an automatic or user-defined biological context for
the auto-discovery of genes that link others in the network and then
resolves the network in the context of a disease, gene, tissue type or
therapeutic as opposed to general associations. For example, imagine
trying form connections to the term "cone" without any context
control. Terms and other concepts connecting to the term "cone" will
be general and broad, in other words a lot less meaningful than they
could be. Now imagine forming connections to the term "cone" in the
context of "pine". Specific terms relating to "tree" will begin to
enter the connection space. Networking gene relationships with this
kind of context control allows for previously unidentified connections
to be brought to the fore as a user of the system might ask "What
relationships are there to this gene in the context of breast
cancer?". In addition, the exploration of hidden or indirect
connections between genes can be observed as further connections are
derived from the many-to-many associations that are analyzed
automatically. Applied to biology and specifically to the science of
cancer genomics in combination with gene expression data this
technique can be quite useful.

Annotation, changes in ontology and updates to literature sources
relating to functional and biochemical pathways are subjective and
constantly evolving and this makes interpretation of gene expression
data and the interrelationships among genes within a dataset a very
complex and daunting task. By submitting a set of genes to the system,
a resulting association and interaction network can be automatically
constructed between genes within the set of genes submitted. The
network is context controlled via user-defined parameters so in turn
the relationships between the genes can be defined or categorized.
This data can also be used to automatically generate paths to
additional genes outside the set of genes submitted that share
disease, chemical or functional relationships.
I've currently been able to integrate therapeutic relationships in the
GenoPharm system I've built using http://www.pharmgkb.com However, the
database of relationships is so small that I rarely am able to get
GenoPharm to establish connections to drugs. Does anyone know of a
much larger database of drug-gene relationships out there that can be
downloaded? Thanks

Kasian Franks 
Life Sciences Division (Mail Stop 83-0101) 
Land: KFranks at lbl.gov 
Sea: 510 393 6221, 510 486 2982 
Lawrence Berkeley National Laboratory 
1 Cyclotron Road 
Berkeley, CA 94720-8265

More information about the Proteins mailing list

Send comments to us at biosci-help [At] net.bio.net