Gonnet et al.

Michael Gribskov gribskov at SDSC.EDU
Mon Jun 29 17:08:11 EST 1992

  Not to belabor the point, but I had a few comments about Gaston Gonnet's
   response to comments on his recent science paper ( > comments by Gonnet)
  >I should note that I received a copy of the article, as it appeared, 
  >only 3 days ago. Checking the above remark, I contemplated with 
  >horror that Fig 2 has been mislabelled in the following way: What 
  >is presented in Fig 2 is a Dayhoff matrix for PAM 250, it is the 
  >best approximation that we could compute at this time. It says 
  >however, "The recommended mutation matrix.." It should say "The 
  >recommended Dayhoff matrix...".
  Personally, I believe this nomenclature to be extremely unfortunate. The
  term "Dayhoff matrix" is nearly universally used to mean the MDM78
  (mutation data scoring matrix; log-odds matrix for 250 PAMs). To call the
  Gonnet et al. matrix a Dayhoff matrix is to imply that it was derived by
  Dayhoffs methodology. A less connotation loaded term would be log-odds
  matrix. The term PAM-250 matrix has also been used virtually as a synonym
  for the MDM78 matrix. 
  >Now, some people have immediately recognized this as a Dayhoff matrix, 
  >which is good. A mutation matrix has all positive entries, is diagonally 
  >dominant and has no entry greater than 1. So Fig 2 is not a mutation 
  >matrix but a Dayhoff matrix. This matrix is the one we recommend to 
  >be used, together with our new deletion-penalty formula, for the N&W 
  It seems unlikely that such a matrix has been used with the
  Needleman-Wunsch algorithm. As you may recall, the NW algorithm (Needleman
  and Wunsch, J. Mol. Biol. 48, 443-453, 1970) does not use an affine gap
  cost, although they do suggest that the "penalty factor could be a function
  of the size and/or direction of the gap". NW requires a scoring table with
  all positive values since only the last row and column of the alignment
  matrix are examined for the maximum score. A scoring table with negative
  values is not guaranteed to give an optimum alignment with the NW
  algorithm. Note that needleman and Wuncsh refer to the position containing 
  the maximum score as being in the first row or column since they build the 
  alignment from N to C terminus, however they mean the last row or column 
  calculated during the alignment. Since many sequences in the database have 
  locally similar segments embedded in unrelated sequences (i.e. cases of 
  partial homology or gene fusion), one wonders what kind of alignments would 
  I think there is another point that is being overlooked. Dayhoff et al. did
  not use closely related sequences to calculate the MDM78 matrix because
  they were unable to align distantly related sequences. A primary reason was
  to be sure that they were comparing sequences that differed only minimally
  in function. Sequences that are no more than 15% different in sequence are
  much less likely to adopt grossly different three-dimensional structures
  than those that are, for example, 40% different. In the case of less
  similar sequences you measure not only the probability that a given
  single residue mutation can be accepted at a certain position, but also the
  overlaying probability that the conformation of the whole segment has
  changed and that only some combinations of segment sequences can fold into
  active structures. For distantly related sequences you are in the position
  of comparing apples and oranges, the two positions you are comparing are
  likely to have different structures and functions (in the micro-structural
  sense not necessarily the enzymatic sense). It is not surprising that 250
  PAM log-odds matrices extrapolated from pairs at different evolutionary
  distances differ; one should be surprised if they did not. Another way of
  looking at this is that as you examine greater and greater evolutionary
  distances, you see more adaptive differences and fewer random (neutral or
  nearly neutral differences). 
  Since I was not around when the original MDM78 work was done, I don't know
  how important these various considerations were in formulating the
  analysis. Perhaps someone at PIR could mention some of the unpublished
  background sometime -- I think it would be very interesting to know more
  about the context that this work was done in. It seems to me that, for all
  of its percieved faults, the analysis that produced the MDM78 matrix was
  very perceptive and years ahead of its time with respect to the interaction
  between sequence and structure. 

  Michael Gribskov
  San Diego Supercomputer Center
  gribskov at sdsc.edu
  (619) 534 - 8312

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net