IUBio

Pattern Searching

James Tisdall tisdall at amalthea.humgen.upenn.edu
Wed Jan 12 09:25:13 EST 1994


>Does anyone know of a good pattern searching program that runs on the
>Macintosh (other than MacPattern, which analyzes single sequences but not
>databases)?

The program DNA WorkBench, which runs on Unix, Mac (and PC), can do a very
powerful set of pattern searching on sequence (or text).  It uses a
full-featured "regular expression" package, which is documented on-line in
the "help regular" and "help regexp" commands.  For example, to find
any dinucleotide repeats of five or more GT dinucleotides in GenBank:

DNA(At 0 of 0)% sequence (AT){5,} gball 
searching...gbsyn...gbphg...gbuna...gbpat...gbrna...gbmam...gbest...gbvrt...gbinv...gbvrl...gbpln...gbbct...gbrod00...gbrod01...gbpri00...gbpri01...gbnew... ...Done(144 seconds).
DNA(At 3519 of 3519)% 

Then, to investigate a particular sequence:

DNA(At 3519 of 3519)% point 1
DNA(At 1 of 3519)% head
     1 AGHCH1G Synthetic hamster-human hybrid cell (HCH-1) HSAG-2 gene A
DNA(At 1 of 3519)% regexp (AT){5,}
     Found 1 hit:
       7086 ATATATATATATATATATATATATATATATATATATATAT

	7081 AGATTATATATATATATATATATATATATATATATATATATATATACATATGTTTGTTGT
		   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~               


The software is available at anonymous ftp site cbil.humgen.upenn.edu
in directory pub/dnaworkbench.


======================================================================
James Tisdall
Departments of Genetics and Computer and Information Science
Computational Biology and Informatics Laboratory, Human Genome Project
University of Pennsylvania

tisdall at cbil.humgen.upenn.edu
215-573-3113
fax 215-573-3111
======================================================================




More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net