DNA WorkBench - update and examples

James Tisdall tisdall at amalthea.humgen.upenn.edu
Mon Dec 13 17:54:31 EST 1993

DNA WorkBench  - Update, Program Examples

In its first week of public release, almost 40,000 database accesses were
 made with DNA WorkBench.  Here is some additional information about a few
 problems with the initial distribution and the updated program.
 At the end I include an example of searching and computing with the program.

The updated program is available at the anonymous ftp site
cbil.humgen.upenn.edu in the directory /pub/dnaworkbench

Unix version is working well - the distribution now comments out an
 include file "sybperl.pl" that is not needed, and the restriction mapping
 feature now is a client-server function.

Mac version - a few bugs fixed, and much better installation instructions
 are now included.  DNA WorkBench likes to open as many as 20 or so open
 socket connections to various servers, but the current version of Perl
 only permits about 8.  The new version of MacPerl, due out sometime
 around January 1, will permit 20 or more open sockets, I'm told.  I will
 announce that upgrade here when available.  At present, doing searches over
 all of GenBank for some arbitrary text, e.g. "text aids gball" will only
 access some of the libraries, complaining about the ones that can't be
 opened due to this limit.

PC version - unfortunately, an MSDOS port of Perl which supports sockets is
 still not available, but one is expected soon.  In the meantime, I've
 added code that bypasses the client-server calls, so the sequence manipulation
 functions are at least available.  Given the great popularity of TCP/IP
 sockets in the PC world, and the availability of public domain socket
 libraries, it should be soon that a port of Perl will include these.  I have
 been in touch with the authors of some of the ports about this.  I will
 announce when it is available here.

Program Example:
Here is a real-life example of computing with DNA WorkBench.  A researcher
wanted to find all GA repeats of 10 or more in the almost 4,000 known
sequences from Arabidopsis thaliana.  Here is a simple solution:

I am playing with dnawb. I need a bit of help. I grabbed all of
the arabidopsis sequence and want to do (ga){10,} on all seqences.
Can this search be done on the whole lot at one time?

Well, here is a short answer - yes.  Here is a long answer:

Use the command "rangeloop", which allows you to perform any sequence
of commands over a specified range of the workspace.

On the command line, do something like this:

dnawb -q -c 'organism arabidopsis ; rangeloop 1-$ ; { ; head ; regexp (ga){10,} ; }' | tee save.output

This is running the program quietly (-q suppresses prompts and some questions
in some interactive commands) and specifying the program on the commandline
(-c for "commandline").  The program is given with each new command separated
by a ";", and the whole thing enclosed within '', and the output is displayed
on the terminal and saved in the file "save.output"  (| tee save.output).

Alternatively, you could put the program into a file, say "myprog",
with contents:

org arabidopsis
rangeloop 1-$
regexp (ga){10,}
and then command:
dnawb -q < myprog | tee save.output

(To test this kind of stuff, say e.g. rangeloop 1-2  to just try it on
a couple of entries.)

James Tisdall
Departments of Genetics and Computer and Information Science
Computational Biology and Informatics Laboratory, Human Genome Project
University of Pennsylvania
tisdall at cbil.humgen.upenn.edu
fax 215-573-3111
"Consider the child.  It can scream all day without becoming hoarse.
 This is true harmony."  -Lao Tse

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net