Hello fellow molbio netlanders:
I posted this earlier today to INFO-GCG at UTORONTO but then reconsidered and
decided that I should also share it with the wider audience of BIO-SOFT. All
program names and credits refer to Gribskov and Devereux's implementation
within the Genetics Computer Group environment. The earlier posting (a bit
long) follows below:
Fellow GCG'ers:
I extensively utilize Michael Gribskov's Profile methods of analysis in my work
for Washington State University. I am very impressed with its power and
elegance and have assembled and analyzed many profiles for the various
researchers that I work with. However, I perceive a sorely lacking aspect of
the method in the absence of an algorithm for performing local dynamic
programming methods between two different profiles. Granted, one can find the
"bestfit" between a profile and a sequence by using ProfileGap or
ProfileSegments, and the individual sequences to be compared to the profile
could have been formed as the "consensus" of other profiles with the /SeqOut=
option of ProfileMake, but being able to fully utilize the power of high gap
penalties in conserved areas and low gap penalties in variable regions of BOTH
multiple sequence alignments could be tremendously powerful. In my own
research I have been trying to find optimum alignments within an experimental
profile of very tightly conserved domains (of which I have assembled profiles)
which appear to be remotely related to the experimental protein family's
multiple sequence alignment as identified with ProfileSearch. Does this make
sense? A method to directly locally align the two profiles would be ideal.
Additionally, it would be very handy in many general instances to see how two
different profiles compare; i.e. how closely related is this protein family to
another, versus just how close is this sequence to another (or a profile)?
Has anybody heard of the development of a program for performing this type of
comparison within the GCG package? I posted this question to Dr. Gribskov
several weeks ago via E-mail; however, he apparently has either been to swamped
with work to respond or has been on vacation or his InterNet connection has
been down, as I have not heard back from him. I do not have the technical
expertise to write such a program if one does not exist---Is anybody
interested? If one does exist or if anybody is interested in creating such a
beast, please let me know, either on this bulletin board or personally at any
of the addresses below in my signature block.
A further, related question has been posed by my boss. In my endeavor to sell
the strengths of the profile method to him, he wanted to know if there was a
way to quantify the "quality" of a profile. Just how "good" is it? I
described Dr. Gribskov's "validation procedure" but he wanted some type of a
numeric measure instead of the rather empirical and subjective system of
validation. Has anybody heard of a way to assign a "quality" score to a
profile other than merely reporting the average Dayhoff score of the input
multiple sequence alignment as shown by PlotSimilarity? An alignment with some
extremely conserved domains and other highly divergent areas may have a lower
overall Dayhoff average than one in which all of the members are somewhat alike
yet which has no strikingly similar areas. Yet I would argue that the former
would yield a "better" profile. Is there a statistical method which could
capture this type of "information content quota"?
Finally (and I do apologize for the long-windedness of this posting) is anyone
or is any archive interested in these profiles as I develop them? It seems a
shame to only have them available to the WSU academic community. Naturally, if
any of them were to be utilized in anybody's research I would expect
acknowledgements. I presently have more than a dozen "high-quality" profiles
prepared apart from the library already utilized by ProfileScan and am creating
new ones all of the time. I am more than willing to discuss this further, if
interest warrants.
Thank you for the time, Steve Thompson
Steven M. Thompson
Consultant in Molecular Genetics and Sequence Analysis
VADMS (Visualization, Analysis & Design in the Molecular Sciences) Laboratory
Washington State University, Pullman, WA 99164-1224, USA
AT&Tnet: (509) 335-0533 or 335-3179 FAX: (509) 335-0540
BITnet: THOMPSON at WSUVMS1 or STEVET at WSUVM1
INTERnet: THOMPSON at wsuvms1.csc.wsu.edu