Yes, there is a better way than dotplot and repeat, and it can be easily done
with GCG. There are two ways to look at this problem:
1) Use WINDOW/STATPLOT to produce a "moving average" of CpG %,
2) Use MAPPLOT with a data file that includes CG, GC and SS sequence patterns.
The WINDOW method produces a graphical output of the CpG %. Usually I compare
this to the GpC %, as well as the SpS % (S = G or C). In mammalian genomes
the CpG % is low (~2%) whereas the GpC% is higher (~5%-10%). In a CpG island
the CpG % is 2-4 fold higher than in the flanking regions (4%-8%). I have
published this type of analysis: White, Liu and Wilson (1996) Arch. Biochem.
Biophys. 335: 161-172. This is a 2 step method in which you first run
WINDOW and then use STATPLOT to produce output--the actual work is done by
WINDOW. I used window sizes of 100, 200 or 300 (300 best) bases, and used the
default increment (3 bp). Be sure to select the appropriate printer language
before you run STATPLOT. (I use HPGL and send the output to a file, so that
I can print it on a Laserjet III.)
The MAPPLOT method is easier, but requires that you create the data file.
The data file is exactly like the restriction enzyme data file, e.g.
Name Offset Pattern Overhang Documentation
CpG 1 CG 0 !
GpC 1 GC 0 !
SpS 1 SS 0 !
To use the data file the command line switch "-dat=DataFileName" must be used.
So the command would look like this:
MAPPLOT -dat=DataFileName SequenceFileName
Once again, yuo have to select the appropriate printer language before you
run MAPPLOT. (This method is just like making a restriction enzyme map, but
using a different data file to search for different patterns.) The output
from this method will be 3 lines with tick marks for each occurance of the
specified pattern--just like a restriction map.
Try both methods and decide for yourself which is most useful. If you need
further clarification on the details I've outlined, let me know. I hope
you find this helpful.
> I'm currently looking for program which could run DNA methylation pattern
> analysis (i.e. find CpG island within a gene interested). Some programs in
> GCG, like repeat or dotplot, is helpful to do this, but they can't present
> a bar graph for methylation pattern looked like |||||||| || ||||||,
> which aligned with a DNA sequence. If you have suggestions or know better
> way to do it, please e-mail me. Thank you.
>> Wei Yu
e-mail: whitejo at pilot.msu.edu
snailmail: 301 Biochemistry
Michigan State University
East Lansing, MI 48824