In article <3644FD16.1C3418B1 at compusmart.ab.ca>, Andrew Gardner <andrewg at compusmart.ab.ca> writes:
>I am working on a BLAST client program. I am able to run BLAST 1
>searches over the web at NCBI by sending requests to the URL recommended
>in the documentation at ftp://ncbi.nlm.nih.gov/blast/blasturl/>
After my signature you'll find such a beast. It's written in DCL,
so unless you're on a VMS system, it won't do you much good directly, but
you should be able to read it to see how it works. Basically the trick
for writing command line (that is what you're doing, right?) clients for
web servers is to save the submission page from your browser, and edit
the client so that it can set all fields, then use a program like
rep_client (which was in the demo on the NCBI site) to stuff the request
into the server and wait for the response. How you handle waiting around
for the request is OS specific though.
I vaguely recall that some of the NCBI servers will send you back a file
in text mode, and others insist on HTML, unless you tell it to email,
then you can get text. (They may have changed this in the 10 months
since I wrote the client.) The client below leaves a text mode
result file on disk.
Regards,
David Mathog
mathog at seqaxp.bio.caltech.edu
Manager, sequence analysis facility, biology division, Caltech
**************************************************************************
$! SAF_BLAST.COM
$!
$! 7-JAN-1998, David Mathog, Biology Division, Caltech
$!
$! Command procedure to access the BLAST server via the web interface.
$!
$! See below for the symbols it looks for and command line format.
$!
$! subroutine to list program options
$!
$ listprogram: subroutine
$ type sys$Input
Select the program to run, your options are:
Program Query against-> Database
------------------------------------------------
blastn Nuc Nuc
blastp Pep Pep
tblastn Pep Nuc->Pep (6 frames)
tblastx Nuc->Pep (6 frames) Nuc->Pep (6 frames)
blastx Nuc->Pep (6 frames) Pep
$ exit
$ endsubroutine
$!
$! subroutine to list datalib options
$!
$ listdatalib: subroutine
$ type sys$Input
Select the datalib to search, your options are:
nr nonredundant (Pep or Nuc)
month all entries < 30 days old (Pep or Nuc)
swissprot Swiss Protein (Pep)
dbest nonredundant EST sequences (Nuc)
dbsts nonredundant STS sequences (Nuc)
pdb Sequences from 3D structure files (Pep)
vector Vectors (Nuc)
kabat Sequences of immunological intereset (Pep)
mito Mitochondrial (Nuc)
alu Some ALU sequences from REPBASE (Nuc)
epd Eukaryotic Promoter Databse (Nuc)
yeast S.cerevisiae genome (Nuc, or coding sequences, Pep)
gss Genome survey sequences (Nuc)
htgs High throughput genomic sequences (Nuc)
E.coli E. coli genome (Nuc, or coding sequences, Pep)
$ exit
$ endsubroutine
$!
$! subroutine to list command line options, and symbols
$!
$ listcommand: subroutine
$ type sys$Input
Usage: @saf_blast P1 P2 P3
Where:
P1 name of query sequence file, such as "sequence.gcg"
P2 (Optional) Comma separated list for fields to prompt for
if they are not supplied by symbols (see below.)
Example: "EXPECT,CUTOFF" would prompt for the EXPECT and CUTOFF
values to use with the search. "OUTFILE" would prompt for
an output file name.
P3 (Optional) "Start,End" - limit the query to this region of
the sequence. Must be enclosed in double quotes. Examples:
"1000,2000" from 1000 to 2000 inclusive
"1000," from 1000 to the end
",2000" from 1 to 2000
"," the whole sequence
If you want to specify P3 and not P2 (or P1), you
must use this syntax: blast "" "" P3
If blast_FIELD symbols are defined they will override defaults.
The symbols this procedure looks for are:
blast_INFILE The GCG formatted query sequence (input file)
blast_OUTFILE Name for BLAST output file
blast_PROGRAM blastn, blastp, tblastn, tblastx, blastx
blast_DATALIB nr, month, swissprot, dbest, dbsts, pdb, vector,
kabat, mito, alu,epd, yeast, gss, htgs, E.coli
blast_EXPECT default, or any floating point number
blast_CUTOFF default or any number >=0
blast_MATRIX default, BLOSUM62, PAM40, PAM120, PAM250, IDENTITY
blast_STRAND both, top, bottom
blast_FILTER default, none, dust, SEG, SEG+XNU, XNU
blast_HISTOGRAM if set, a histogram is drawn, default is none
blast_NCBI_GI if set, show NCBI gi numbers in output, default is not to
blast_DESCRIPTIONS default or any number >=0
blast_ALIGNMENTS default or any number >= 0
blast_ADVANCED other BLAST command line options
blast_EMAIL send response via email to address it holds
blast_HTML send response in HTML format
$ exit
$ endsubroutine
$!
$! define symbols for program used
$!
$ hereis = f$environment("PROCEDURE")
$ hereis = f$element(0,"]",hereis) + "]"
$ rep_client = "''hereis'rep_client"
$ FIELDS ="INFILE,OUTFILE,PROGRAM,DATALIB,EXPECT,MATRIX,STRAND,FILTER,HISTOGRAM,NCBI_GI,DESCRIPTIONS,ALIGNMENTS,ADVANCED,EMAIL,PATH"
$! path is a bit different, called EMAIL externally
$ PATH=""
$!
$!
$ promptfor = "''P2'"
$ promptfor = f$edit(promptfor,"COLLAPSE,UPCASE")
$!
$! INFILE
$!
$ askfor = "INFILE"
$ default = "''P1'"
$ blab = ""
$ gosub doprompt
$ if ("''INFILE'" .eqs. "")
$ then
$ call LISTCOMMAND
$ exit
$ endif
$!
$ if (P3 .nes. "")
$ then
$ startfrom = f$element(0,",",P3)
$ startfrom = f$EDIT(startfrom,"COLLAPSE")
$ endat = f$element(1,",",P3)
$ endat = f$EDIT(endat,"COLLAPSE")
$ if(startfrom .eq. ",")then goto badp3
$ if(startfrom .eqs. "")
$ then
$ section = ""
$ else
$ section = "/begin=''startfrom'"
$ endif
$ if(endat .eq. ",")then goto badp3
$ if(endat .nes. "")
$ then
$ section = section + "/end=''endat'"
$ endif
$ endif
$ goto notbadp3
$!
$ badp3:
$ write sys$output "The range you specified [''P3'] is invalid"
$ Type sys$input
Use one of these forms:
"1000,2000" from 1000 to 2000 inclusive
"1000," from 1000 to the end
",2000" from 1 to 2000
"," the whole sequence
The double quotes on each side are MANDATORY
If you want to specify P3 and not P2 (or P1), you
must use this syntax: blast "" "" P3
$ exit
$!
$notbadP3:
$!
$ time = f$time()
$ killstring = f$cvtime(time,,"hour") + -
f$cvtime(time,,"minute") + -
f$cvtime(time,,"second") + -
f$cvtime(time,,"hundredth")
$ killfile = "KILL_" + killstring + ".seq"
$ comfile = "KILL_" + killstring + ".com"
$ mypid = f$getjpi("","PID")
$ back= f$extract(4,4,mypid)
$ subname = "K" + back + killstring
$!
$ tofasta/infile='infile'/out='killfile' 'section'/default
$!
$! PROGRAM
$!
$ checkstring="blastn blastp blastx tblastx tblastn"
$ if (f$type(BLAST_PROGRAM) .eqs. "")then promptfor = promptfor + ",PROGRAM"
$ topprogram:
$ askfor = "PROGRAM"
$ default = "blastn"
$ blab = "LISTPROGRAM"
$ gosub doprompt
$ PROGRAM = f$EDIT(PROGRAM,"COLLAPSE,LOWERCASE")
$ if(f$length(checkstring) .eq. f$locate(PROGRAM,checkstring))
$ then
$ write sys$Output "''PROGRAM' is not a valid option for PROGRAM"
$ goto topprogram
$ endif
$!
$! OUTFILE
$!
$!
$! come up with a name for the output file, if one is not supplied
$!
$ askfor = "OUTFILE"
$ default = infile
$! strip off any nasty characters which might be in it now
$! looks for [] or logical: and removes them
$!
$ tdefault = f$element(1,"]",default)
$ if (tdefault .eqs. "]")then tdefault = default
$ default = tdefault
$ tdefault = f$element(1,":",default)
$ if (tdefault .eqs. ":")then tdefault = default
$ default = tdefault + "."
$ default = f$element(0,".",default)
$ default = default + ".''PROGRAM'"
$ write sys$output "output will be ''default'"
$ gosub doprompt
$!
$! DATALIB
$!
$ checkstring="nr month swissprot dbest dbsts pdb vector kabat mito alu epd yeast gss htgs e.coli"
$ topDATALIB:
$ askfor = "DATALIB"
$ default = "nr"
$ blab = "listdatalib"
$ gosub doprompt
$ DATALIB = f$EDIT(DATALIB,"COLLAPSE,LOWERCASE")
$ if(f$length(checkstring) .eq. f$locate(DATALIB,checkstring))
$ then
$ write sys$Output "''DATALIB' is not a valid option for DATALIB"
$ goto topDATALIB
$ endif
$!
$! this next may or may not be necessary, BLAST documentation is unclear
$!
$ if(DATALIB .eqs. "e.coli")then DATALIB = "E.coli"
$!
$! EXPECT
$!
$ askfor = "EXPECT"
$ default = "default"
$ blab = ""
$ gosub doprompt
$!
$! CUTOFF
$!
$ askfor = "CUTOFF"
$ default = "default"
$ gosub doprompt
$!
$! MATRIX
$!
$ askfor = "MATRIX"
$ default = "default"
$ gosub doprompt
$!
$! STRAND
$!
$ askfor = "STRAND"
$ default = "both"
$ gosub doprompt
$!
$! FILTER
$!
$ askfor = "FILTER"
$ default = "default"
$ gosub doprompt
$!
$! HISTOGRAM
$!
$ askfor = "HISTOGRAM"
$ default = ""
$ gosub doprompt
$!
$! NCBI_GI
$!
$ askfor = "NCBI_GI"
$ default = ""
$ gosub doprompt
$!
$! HTML
$!
$ askfor = "HTML"
$ default = ""
$ gosub doprompt
$!
$! DESCRIPTIONS
$!
$ askfor = "DESCRIPTIONS"
$ default = "100"
$ gosub doprompt
$!
$! ALIGNMENTS
$!
$ askfor = "ALIGNMENTS"
$ default = "100"
$ gosub doprompt
$!
$! ADVANCED
$!
$ askfor = "ADVANCED"
$ default = ""
$ gosub doprompt
$!
$! EMAIL/PATH
$!
$ askfor = "EMAIL"
$ default = ""
$ gosub doprompt
$ if(EMAIL .nes. "")
$ then
$ PATH = EMAIL
$ EMAIL = "IS_SET"
$ endif
$!
$! now assemble the command file to send
$!
$!
$! create a stream-lf file because the sequence will also be stream lf,
$! and otherwise append generates warnings. STREAMLF must be a system
$! wide symbol that maps to something like:
$! create/fdl=shrdisk:[shared.misc]streamlf.fdl
$!
$ streamlf 'comfile'
$ open/append ofil: 'comfile'
$!
$ write ofil: "www.ncbi.nlm.nih.gov"
$ write ofil: "POST /cgi-bin/BLAST/nph-blast_report HTTP/1.0X"
$! write ofil: "WWW_BLAST_TYPE unfin_gen"
$!
$! skip INFILE,OUTFILE, those don't go to NCBI
$!
$ count = 2
$ allfields:
$ string = f$element(count,",",fields)
$ if(string .nes. ",")
$ then
$ value = 'string'
$ value = "''value'"
$ if(value .nes. "")then write ofil: "''STRING' ''VALUE'"
$ count = count + 1
$ goto allfields
$ endif
$ write ofil: "BEGIN"
$ close ofil:
$ append 'killfile' 'comfile'
$ delete 'killfile';
$!
$! run it in a subprocess so that we can keep track of it
$!
$! type 'comfile'
$ create 'outfile'
$ spawn/nowait/input='comfile'/output='outfile'/process='subname' -
run 'rep_client'
$!
$! find the darn subprocess!
$!
$ context = ""
$ findsub:
$ apid = f$pid(context)
$ if(apid .eqs. "")
$ then
$ write sys$Output "Fatal error, connection to NCBI died"
$ exit
$ endif
$ procname = f$getjpi(apid,"PRCNAM")
$ if(procname .nes. subname)then goto findsub
$!
$! apid is its ID, procname is its process name
$!
$ write sys$output "Now processing job, subprocess is ''APID'"
$
$ waiting:
$ procname = ""
$ define/user/nolog sys$error nla0:
$ define/user/nolog sys$output nla0:
$ procname = f$getjpi(apid,"PRCNAM")
$ deass/user sys$error
$ deass/user sys$output
$ if (procname .eqs. "")then goto done
$ time=f$time()
$ write sys$output "still waiting at ''time'"
$ wait 00:00:20.00
$ goto waiting
$!
$ done:
$ fime=f$time()
$ write sys$output "BLAST job completed at ''time', results in ''OUTFILE'"
$ delete 'comfile';
$ exit
$!
$! prompt routine. Set response to default value, then override that
$! with symbol's value (if it exists), and lastly, override that with
$! a prompt, if it was asked for on the command line
$!
$ doprompt:
$ 'ASKFOR' = default
$ if (f$type(BLAST_'ASKFOR') .nes. "")then 'ASKFOR' = blast_'ASKFOR'
$!
$ if(f$locate(askfor,promptfor) .eq. f$length(promptfor))then return
$ if("''blab'" .nes. "")then call 'blab'
$ READ/PROMPT="Enter a value for ''ASKFOR': " sys$command response
$ 'askfor' = response
$ return