Over the years, I've done various studies that suggest that DNA and RNA
can be modelled as a "Markov" sequence. Obviously, genetic sequences
are not random, but very "DNA-like sequences" can be created if there is
a Markov correlation. For DNA and RNA, a "Markov process" means that
the bases are not independent, but rather a previous base in a sequence
affects the frequency of occurrence of a neighbor base. Sequences
produced in this manner may "fool" you since they statistically look so
much like real genetic sequences. These correlations are not hard to
model on a computer.
Since I don't have access to RNA folding programs,
I'm wondering if anyone would like to fold an artificial Markov RNA
sequence to see if it looks like "real" RNA folded sequences. I'm
providing a recipe for producing the sequence below. It generates what
I call a Markov GC sequence. Try P0=.3, P1=.7. You can also produce a
4-valued sequence (GCAU) using the same approach. If we find any
interesting results, I'd be happy to coauthor a short paper on this with
interested parties.
Below is a method that allows you to generate a correlated sequence of
Gs and Cs. Each base has "knowledge" of the base which comes before it.
For simplicity, lets use a random binary sequence called B(i). "i"
simply counts the number of bases which can be symoblized by 0 and 1.
P0 and P1 are the probabilities that B(i) is equal to zero or one,
respectively, if B(i-1) is equal to zero. P1 and 1 - P1 are the
probabilities that B(i) is equal to one or zero, respectively, if B(i-1)
is equal to one. Since the values of B(i) depend on the values at
B(i-1), even small deviations from randomness (i.e. P0 and P1 not equal
to 0.5 ) affect the DNA.
Note: A method for generating a Markov process is included
below. If p0 and p1 are 0.5, then a random sequence of 0s and 1s
is generated.
{
olddata=0;
For a 100-bases RNA sequence, do this 100 times:
Random(result); /* return a random number on (0,1) */
if olddata=0 then if result < p0 then data = 0; else data=1;
if olddata=1 then if result < p1 then data = 1; else data=0;
if data = 1 then Write("G");
if data = 0 then Write("C");
olddata = data
end;