Protein scoring matrices in GCG 9

Cedric Govaerts cgovaert at ulb.ac.be
Fri Apr 11 07:21:09 EST 1997

In article <5gods4$8oc at lyra.csx.cam.ac.uk>, tjrc1 at mole.bio.cam.ac.uk says...
>Hi people,
>I have had a number of queries from users on my system about what
>appears (to them) to be major differences between the output from pileup
>in GCG 8 and GCG 9.
>This seems to be due to the new default scoring matrix; does anyone
>know what the rationale was behind this change, and why does it
>produce such different answers to the previous GCG version?  This
>seems to have confused a lot of users.  Of course I can tell them to
>use -matrix=oldpep.cmp if they want the same results as GCG 8, but how
>should they determine which is the appropriate scoring matrix to use?

As explained by Lynn Miller, the default scoring matrix in GCG9 is
blosum62. In GCG8.1, it was the renormalized PAM250. The renormalization
has absolutely no biological sense at all, because it gives (more or less)
the same weight for a conserved tryptophan than for a conserved glycine.
One could therefore expect that the new matrix choice would give better
Unfortunately, and for and unexplained reason, this is not what I've
experienced with a set of 30 sequences, not very related, but sharing a
common motif.  Pileup in GCG8.1 found that motif very accurately and gave
an overall very good alignement with the default parameters.
Using default parameter in GCG9, pileup split the alignement in several
subset and refuses to align the subsets (it introduces hundreds of gaps
in order to shift completely the subsets one to the other).
I've tried to recover the alignement by modifying the parameters, but
I couldn't get something decent.
It is important to note that the alignement found by GCG8.1 has a biological
meaning and is not hazardous, but that information is lost with GCG9.
The only explaination that I see is that the sequences are too distant to
be aligned correctly and that GCG8.1 found a good alignement "by luck"
with the renormalized matrix being, by chance, well adapted to my set
of sequences.

Nevertheless, I've lost faith in pileup, and shall maybe try ClustalW or
ClustalX in the future.

If anyone has suggestion, I would appreciate,

.       Cedric Govaerts                                         .
.       Institut de Recherche Interdisciplinaire                .
.       en Biologie Humaine et Nucleaire                        .
.       808, route de Lennik 1070 Bruxelles                     .
.       email : cgovaert at ulb.ac.be or cgovert at dbm.ulb.ac.be     .
.       Phone : Int + 32 2 555 41 87 / Fax : 555 46 55          .

More information about the Info-gcg mailing list

Send comments to us at biosci-help [At] net.bio.net