Jennifer Hallinan wrote:
>> Can anyone tell me how to compute the checksum in the header of a GCG
> format DNA sequence file?
>> Thanks,
>> Jennifer
GCG checksums are calculated by a simple hashing, much like
the hash function examples in K&R.
Here's an example in C, with SwissProt:CALM_HUMAN as the test sequence.
The Checksum should be 2160.
Hope this helps,
Guy.
--
/* START EXAMPLE */
#include <stdio.h>
#include <ctype.h>
static int CheckSumGCG(char *seq){
register int i, check = 0;
for(i = 0; seq[i] != '\0'; i++)
if(isalpha(seq[i]))
check += ((i % 57) + 1) * seq[i];
return check % 10000;
}
int main(){
register char *calm_human =
"ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQD"
"MINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGYI"
"SAAELRHVMTNLGEKLTDEEVDEMIREADIDGDGQVNYEEFVQMMTAK";
printf("Human Calmodulin GCG Checksum = %d\n",
CheckSumGCG(calm_human) );
return 0;
}
/* END EXAMPLE */
--
----------------------------------------------------------------------
Guy St.C. Slater, Tel : (44) 1223 494 565
Human Genome Mapping Project Resource Centre, Fax : (44) 1223 494 512
Wellcome Trust Genome Campus, mailto:gslater at hgmp.mrc.ac.uk
Hinxton, Cambridge, CB10 1SB. http://www.hgmp.mrc.ac.uk/~gslater/
----------------------------------------------------------------------