Austin,
You might like to go to
http://www.proxeon.com/proteincenter-open-access.html
The single protein lookup in ProteinCenterT Open Access is a small subset of
the full functionality provided by the commercial version of ProteinCenterT,
which takes you to a completely new discovery level, enabling comparison of
data sets with thousands of proteins in minutes, with advanced clustering
and filtering to quickly reach biological conclusions.
Regards,
Derek
-----Original Message-----
From: proteins-bounces at oat.bio.indiana.edu
[mailto:proteins-bounces at oat.bio.indiana.edu] On Behalf Of Rex Eastbourne
Sent: Tuesday, May 30, 2006 19:44
To: proteins at magpie.bio.indiana.edu
Subject: [Protein-analysis] Re: Newbie question about microarray analysis
Hi Austin,
I just have a plain list of 200 proteins, without data from the
experiment. I need to cluster the proteins by their inherent
characteristics (function, ancestry). I used the protein database on
the NCBI website to get the sequences. Now, I want to take all these
200 sequences and get some measure of how similar each is to each
other. I figure this would require some specific software that would
allow me to enter all the proteins and see how they're related. I found
ProtoNet, but it seems you can only enter one protein and explore its
specific cluster. Are there any other tools for this I might not be
aware of?
I'm sorry to keep asking you questions like this -- just referring me
to a website that explains this would be greatly appreciated.
Thank you,
Rex
Austin P. So (Hae Jin) wrote:
> Rex Eastbourne wrote:
> > Thanks again for replying. The k-means algorithm should be a snap. But
> > how do I convert the proteins, which are in the format
> > "UPSP_SLDJK_HUMAN_P12182" to vectors that can be handled by the
> > mathematical algorithm (i.e. what is the "distance" between two
> > proteins)? Is there already a program that does this? (I understand
> > there's something on the NCBI's website.)
>> So, if I understand the format of the data:
>> 1. "UPSP_SLDJK_HUMAN_P12182" is just a name...say it is a row id.
> 2. with that name (i.e. in each row), you will have a series of data
> points, each data point corresponding the amount of protein found in
> patient X (technically you don't have to know if they have the disease
> or not).
> 3. each column (i.e. patient data) will therefore be a
> (multidimensional) data vector, with each protein being an "axis".
>> patient1 patient2 patient3 patient4
> protein1 1 50 49 3
> protein2 2 35 30 1
> protein3 30 20 20 31
>> In this way you can apply (hierarchical) k-means clustering on the
> column "vectors".
>> Note that you may not get anything either since ultimately your analysis
> is only as good as your data...
>> Austin
_______________________________________________
Proteins mailing list
Proteins at net.bio.nethttp://www.bio.net/biomail/listinfo/proteins