David Saul (dj.saul at auckland.ac.nz) wrote:
> Can't find this in the documentation.
> Can anbody tell me how the checksum is calculated in GCG formated sequences.
> Also each sequence in an .msf file has its own checksum but there is
> another in the header. What is this calculated from?
The information is given in the documentation, though it can take time to
find.
There are two different algorithms, 1 for general text and the other for
sequence data.
A Pascal implementation of the sequence algorithm is:
function upcchr(achr: char): char;
begin
if (achr in ['a'..'z']) then
upcchr := chr(ord(achr) + ord('A') - ord('a'))
else
upcchr := achr;
end;
function uwcheckinc(sptr: integer;
residue: char): integer;
{ returns the incremental value for this position and base in UW
checksum }
begin
uwcheckinc := (1 + ((sptr-1) mod 57))*ord(upcchr(residue));
end; { of uwcheckinc }
function uw_sumcheck(var seq: packed array
[lb..ub: integer] of char;
slength: integer): integer;
{ returns accumulated total checksum for UWGCG based on contents
of array, and length. Case-independent. }
var
sptr: integer;
sum: integer;
begin
sum := 0;
for sptr := 1 to slength do
sum := sum + uwcheckinc(sptr,seq[sptr]);
uw_sumcheck := sum;
end; { of function uw_sumcheck }
============================================================
Peter A. Stockwell
Dept of Biochemistry,
University of Otago
Dunedin, New Zealand.
> Thanks in advance
> Dave Saul
> --
> David Saul dj.saul at auckland.ac.nz> School of Biological Sciences Tel 64 9 3737599 Ext 7712
> University of Auckland FAX 64 9 3737414
> Auckland Private bag 90219
> New Zealand