Protein scoring matrices in GCG 9

Lynn Miller miller at gcg.com
Wed Mar 19 16:07:41 EST 1997

Tim Cutts (tjrc1 at mole.bio.cam.ac.uk) wrote:

>I have had a number of queries from users on my system about what
>appears (to them) to be major differences between the output from pileup
>in GCG 8 and GCG 9.
>This seems to be due to the new default scoring matrix; does anyone
>know what the rationale was behind this change, and why does it
>produce such different answers to the previous GCG version?  This
>seems to have confused a lot of users.  Of course I can tell them to
>use -matrix=oldpep.cmp if they want the same results as GCG 8, but how
>should they determine which is the appropriate scoring matrix to use?

Dramatic changes have been made to the scoring matrices used
by GCG program starting with version 9.0 of the Wisconsin Package.

1) Format change leads to rescaled matrices
  In version 8.1, and before, each program in GCG had a corresponding
  scoring matrix. These matrices were filled with floating point (real
  number) values, and the protein matrices were based on the PAM250

  Starting with version 9.0 of the package, the matrices are converted
  to integer values and the values are ten times greater.  Thus a value
  of 1.0 in swgapdna.cmp in previous software versions has been 
  converted and rescaled to a value of 10 in swgapdna.cmp in version
  9.0.  This was done in order to make the matrices provided by the
  Wisconsin Package more similar to scoring matrices provided by others.
  The change in magnitude of the values in the scoring matrices,
  however, necessitated a change in the magnitude of the default gap
  penalties. These changes are documented in the Version 9.0 User
  Notes section that concerns package-wide enhancements: New Scoring
  Matrices.  You can view this on-line with the GCG command:

         genhelp whats_new_90

  This format and rescaling change is particularly apparent for
  matrices, where the matrices are unchanged beyond the reformatting and

  (NOTE:  A copy of the "old" protein matrix, rescaled and in
  the new format, is available, but we do not recommend it's use.  
  It is no longer the default matrix)

2) Blosum 62 is the new protein matrix default for most programs

  Starting with version 9.0, the default matrix used for most programs
  has changed.  For protein alignments, most programs now use BloSum62
  as the default scoring matrix (FastA uses BloSum50).  
  Even if we hadn't made a change in the format of the matrices 
  we would still have changed the default protein scoring matrix.
  The Blosum62 matrix (now used by all GCG programs except Fasta)
  is the matrix most accepted in the scientific literature, and has
  long been the default matrix used by the BLAST program.  For more
  information on this matrix I highly recommend the paper 

    Henikoff, S. and Henikoff, J. G. (1992). 
    Amino acid substitution matrices from protein blocks.  
    Proc. Natl. Acad. Sci. USA 89: 10915-10919.  

It is not surprising that your results are different when using 
the new Blosum 62 scoring matrix.  We believe that the results 
with the new matrices are more valid scientifically.  You might
also want to experiment with the gap creation and extension penalties,
since the ideal ones to use can be different for each alignment.
Regardless of the matrix and penalties used, it is always a good
idea to visually inspect the alignment to make sure that it makes
sense to you.

Lynn Miller
       Lynn Miller                   ||   phone: (608) 231-5200
       Technical Support Coordinator ||     fax: (608) 231-5202
       Genetics Computer Group, Inc. ||  e-mail: help at gcg.com
       575 Science Drive             ||  e-mail:
Lynn.Miller at gcg.com        
       Madison, WI  53711-1060 USA   ||  e-mail: miller at gcg.com
                                     ||     WWW: http://www.gcg.com

More information about the Info-gcg mailing list

Send comments to us at biosci-help [At] net.bio.net