Rex Eastbourne wrote:
> I just have a plain list of 200 proteins, without data from the
> experiment. I need to cluster the proteins by their inherent
> characteristics (function, ancestry). I used the protein database on
> the NCBI website to get the sequences. Now, I want to take all these
> 200 sequences and get some measure of how similar each is to each
> other.
The standard way to do this is to use ClustalX, which does an alignment
of amino acid sequences (Needleman/Wunsch algorithm) of every protein
with every other and thus calculates a similarity matrix. With this
matrix a phylogenetic tree is calculated and printed in a text format.
Programs like TreeView can read this and display the trees graphically.
All these programs are freely available on the net.
Note however that this has nothing to do with protein array data. In a
protein array experiment you measure the expression of proteins in cells
or tissues depending on experimental factors (e.g. the presence or
absence of a disease) and then find groups of proteins which react in a
similar way (e.g. expression goes up).
"Similarity" thus has totally different meanings in the two fields.