DNA WorkBench - Update, Program Examples
In its first week of public release, almost 40,000 database accesses were
made with DNA WorkBench. Here is some additional information about a few
problems with the initial distribution and the updated program.
At the end I include an example of searching and computing with the program.
The updated program is available at the anonymous ftp site
cbil.humgen.upenn.edu in the directory /pub/dnaworkbench
Unix version is working well - the distribution now comments out an
include file "sybperl.pl" that is not needed, and the restriction mapping
feature now is a client-server function.
Mac version - a few bugs fixed, and much better installation instructions
are now included. DNA WorkBench likes to open as many as 20 or so open
socket connections to various servers, but the current version of Perl
only permits about 8. The new version of MacPerl, due out sometime
around January 1, will permit 20 or more open sockets, I'm told. I will
announce that upgrade here when available. At present, doing searches over
all of GenBank for some arbitrary text, e.g. "text aids gball" will only
access some of the libraries, complaining about the ones that can't be
opened due to this limit.
PC version - unfortunately, an MSDOS port of Perl which supports sockets is
still not available, but one is expected soon. In the meantime, I've
added code that bypasses the client-server calls, so the sequence manipulation
functions are at least available. Given the great popularity of TCP/IP
sockets in the PC world, and the availability of public domain socket
libraries, it should be soon that a port of Perl will include these. I have
been in touch with the authors of some of the ports about this. I will
announce when it is available here.
Program Example:
Here is a real-life example of computing with DNA WorkBench. A researcher
wanted to find all GA repeats of 10 or more in the almost 4,000 known
sequences from Arabidopsis thaliana. Here is a simple solution:
Jim,
I am playing with dnawb. I need a bit of help. I grabbed all of
the arabidopsis sequence and want to do (ga){10,} on all seqences.
Can this search be done on the whole lot at one time?
Joe
Well, here is a short answer - yes. Here is a long answer:
Use the command "rangeloop", which allows you to perform any sequence
of commands over a specified range of the workspace.
On the command line, do something like this:
dnawb -q -c 'organism arabidopsis ; rangeloop 1-$ ; { ; head ; regexp (ga){10,} ; }' | tee save.output
This is running the program quietly (-q suppresses prompts and some questions
in some interactive commands) and specifying the program on the commandline
(-c for "commandline"). The program is given with each new command separated
by a ";", and the whole thing enclosed within '', and the output is displayed
on the terminal and saved in the file "save.output" (| tee save.output).
Alternatively, you could put the program into a file, say "myprog",
with contents:
org arabidopsis
rangeloop 1-$
{
head
regexp (ga){10,}
}
and then command:
dnawb -q < myprog | tee save.output
(To test this kind of stuff, say e.g. rangeloop 1-2 to just try it on
a couple of entries.)
Jim
======================================================================
James Tisdall
Departments of Genetics and Computer and Information Science
Computational Biology and Informatics Laboratory, Human Genome Project
University of Pennsylvania
tisdall at cbil.humgen.upenn.edu
215-573-3113
fax 215-573-3111
======================================================================
"Consider the child. It can scream all day without becoming hoarse.
This is true harmony." -Lao Tse