I am developing a statistical technique to classify inputs according
to their structural characteristics in such a way that functionally
similar groups emerge. An example is the determination of
functionally significant features like the cellular location of
proteins (e.g., DNA binding protein, membrane protein, secretory
protein) from local sequence properties (e.g., average hydrophobicity,
maximum hydrophobicity, maximum charge, periodicity of appearance of a
specific amino acid residue, etc.). While data about these structural
attributes are readily available, it is more difficult to determine
functional data. Hence, a preliminary classification based on cheaper
structural data could serve researchers by reducing the functional
properties likely to appear on closer inspection.
Although the correspondence between clusters within a given structural
feature subspace and functional clusters is not always simple, it can
in theory be derived. So far our results are promising, and we have
reached the point where a specific application of the technique is
both feasible and necessary for further development. I am interested
in knowing what data bases are available for this purpose. Of
particular interest is the National Biomedical Research Foundation
protein database, which has been used by other researchers in related
work. If you are aware of potentially relevant data bases and/or how
to acquire them, please let me know by email.
Thank you, John Reynolds