How do I find sites?

Don Gilbert gilbertd at sunflower.bio.indiana.edu
Wed Sep 23 18:54:39 EST 1992

SeqApp does restriction site mapping in this way:

a) read in table of restriction enzymes w/ their sites 
	(see rich robert's rebase in various formats)

b) pull sequence from file somewhere

c) for each r. enzyme, find its cut points on the sequence.
   This is a basic pattern matching common to much of
   biosequence analysis software. 
The particular algorithm I use is derived from general
string matching in software (like in text editing, and
many other software programs).  The nucleic pattern of
the r.enzyme cut site is slid along the sequence and
each matching point is recorded.  Some finess is needed
to deal with ambiguous bases, reverse complements, etc.

Hash tables are a way to do this more quickly if need be,
at added complexity.  See, for instance, FastA source*
by Pearson and Lipman for hash table use in matching 
a sequence to a library of sequences. This complexity is 
probably not needed for r.e. mapping.
The basics here and in a lot of gene sequence analysis
are that of sliding one sequence of letters against
another and recording matchings.

* one distribution site is anonymous ftp to ftp.bio.indiana.edu,
 as /molbio/search/fasta16c2.shar.Z   You may want to poke around
 this archive looking for other program source examples. This
 archive started as my personal library of examples to 
 help me learn to write biocomputing software.
Don Gilbert                                     gilbert at bio.indiana.edu
biocomputing office, biology dept., indiana univ., bloomington, in 47405

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net