[Protein-analysis] Re: Newbie question about microarray analysis

Derek Potter d.g.potter at btinternet.com
Wed May 31 09:49:11 EST 2006


You might like to go to

The single protein lookup in ProteinCenterT Open Access is a small subset of
the full functionality provided by the commercial version of ProteinCenterT,
which takes you to a completely new discovery level, enabling comparison of
data sets with thousands of proteins in minutes, with advanced clustering
and filtering to quickly reach biological conclusions.



-----Original Message-----
From: proteins-bounces at oat.bio.indiana.edu
[mailto:proteins-bounces at oat.bio.indiana.edu] On Behalf Of Rex Eastbourne
Sent: Tuesday, May 30, 2006 19:44
To: proteins at magpie.bio.indiana.edu
Subject: [Protein-analysis] Re: Newbie question about microarray analysis

Hi Austin,

I just have a plain list of 200 proteins, without data from the
experiment. I need to cluster the proteins by their inherent
characteristics (function, ancestry). I used the protein database on
the NCBI website to get the sequences. Now, I want to take all these
200 sequences and get some measure of how similar each is to each
other. I figure this would require some specific software that would
allow me to enter all the proteins and see how they're related. I found
ProtoNet, but it seems you can only enter one protein and explore its
specific cluster. Are there any other tools for this I might not be
aware of?

I'm sorry to keep asking you questions like this -- just referring me
to a website that explains this would be greatly appreciated.

Thank you,


Austin P. So (Hae Jin) wrote:
> Rex Eastbourne wrote:
> > Thanks again for replying. The k-means algorithm should be a snap. But
> > how do I convert the proteins, which are in the format
> > "UPSP_SLDJK_HUMAN_P12182" to vectors that can be handled by the
> > mathematical algorithm (i.e. what is the "distance" between two
> > proteins)? Is there already a program that does this? (I understand
> > there's something on the NCBI's website.)
> So, if I understand the format of the data:
> 1. "UPSP_SLDJK_HUMAN_P12182" is just a name...say it is a row id.
> 2. with that name (i.e. in each row), you will have a series of data
> points, each data point corresponding the amount of protein found in
> patient X (technically you don't have to know if they have the disease
> or not).
> 3. each column (i.e. patient data) will therefore be a
> (multidimensional) data vector, with each protein being an "axis".
> 		patient1	patient2	patient3	patient4
> protein1	1	50	49	3
> protein2	2	35	30	1
> protein3	30	20	20	31
> In this way you can apply (hierarchical) k-means clustering on the
> column "vectors".
> Note that you may not get anything either since ultimately your analysis
> is only as good as your data...
> Austin

Proteins mailing list
Proteins at net.bio.net

More information about the Proteins mailing list

Send comments to us at biosci-help [At] net.bio.net