We did a project a few years ago in which a neural network approach was
applied to the prokaryotic promoter problem. The results were pretty
decent,
especially considering that it was only a 6-month project that was
really
focused on massively parallel implementations of neural nets for
sequence
analysis applications, and this was just a test case. It was fun to
watch
the hidden layer activate at the "-35" and "-10" conserved regions,
anyway.
What's particularly cool is that if you use a network with a single unit
in
the hidden layer, you can pretty much read the consensus sequence (and,
I
suppose, derive a detection matrix) from the activation of that hidden
unit
as the sequence is shifted over the inputs.
Alas, it was a Phase I SBIR project that didn't win Phase II funding.
Anyway... for a matrix-oriented approach, see:
Schneider, Stormo, Gold, and Ehrenfeucht, J.Mol.Bio. 188, 415 (1986).
I think this was their log-odds matrix for the "-10" region:
A -2.76 1.82 0.06 1.23 0.96 -2.92
C -1.46 -3.11 -1.22 -1.00 -0.22 -2.21
G -1.76 -5.00 -1.06 -0.67 -1.06 -3.58
T 1.67 -1.66 1.04 -1.00 -0.49 1.84
I don't have the "-35" region matrix handy, though. As I recall, the
actual
distance between the -10 and -35 signals can vary a bit, so be sure to
apply
the two matrices separately.
Happy Hunting!
John
============================================================================
John R. Hartman E-mail: john at cbi.com
Computational Biosciences, Inc. Voice:
(313)426-9050
P.O. Box 2090 Fax:
(313)426-5311
Ann Arbor, Michigan 48106
============================================================================