IUBio

Using Perl's LWP to access SRS?

Gary Williams gwilliam at hgmp.mrc.ac.uk
Fri Jul 17 05:52:52 EST 1998


In article <6mpk80$kum$1 at jetsam.uits.indiana.edu>,
Don Gilbert <gilbertd at bio.indiana.edu> wrote:
>You may have better success with GET instead of POST, if
>you suitably url-encode the somewhat messy url's that can
>result:
>
>http://srs.ebi.ac.uk:5000/srs5bin/cgi-bin/wgetz?[libs%3D%7Bembl_SP_emblnew%7D-all:esterase*]+-e
>
>For instance this pulls all the reports from two libraries
>for a term using the -all field.  You can also use compound queries,
>joining [term]&[term].
>
>Knowing the 'getz' syntax for srs will help.  Knowing
>that _SP_ is a hidden SRS key for joining library names
>will help.  It is not hard to hack together a simpler interface
>that wgetz for automated queries, if you are running your
>own SRS server.  IUBio has one such (see the iubio.bio.indiana.edu/Genbank
>section).


I've got a slightly different problem to the one that has been discussed in this thread,

1) The program has been passed either an ID or an AccNumber - it don't know which.

2) I want to search EMBLNEW and EMBL.

3) I want to extract just one sequence from the databases - the latest
one from EMBLNEW or failing that, the one from EMBL. 


The URL:

http://srs5.hgmp.mrc.ac.uk/srs5bin/cgi-bin/wgetz?-sf+embl+-e+[libs={EMBLNEW_SP_EMBL}-id:$entry|libs={EMBLNEW_SP_EMBL}-acc:$entry]

looks as if it ought to work, but...

http://srs5.hgmp.mrc.ac.uk/srs5bin/cgi-bin/wgetz?-sf+embl+-e+[libs={EMBLNEW_SP_EMBL}-id:hsfau|libs={EMBLNEW_SP_EMBL}-acc:hsfau]

gives "arguments missing"

http://srs5.hgmp.mrc.ac.uk/srs5bin/cgi-bin/wgetz?-sf+embl+-e+[{EMBLNEW_SP_EMBL}-id:hsfau|{EMBLNEW_SP_EMBL}-acc:hsfau]

pulls out the sequence OK.

http://srs5.hgmp.mrc.ac.uk/srs5bin/cgi-bin/wgetz?-sf+embl+-e+[{EMBLNEW_SP_EMBL}-id:X65923|{EMBLNEW_SP_EMBL}-acc:X65923]

gives: Information: no entries found, query: 
"[{EMBLNEW EMBL}-id:X65923|{EMBLNEW EMBL}-acc:X65923]"

http://srs5.hgmp.mrc.ac.uk/srs5bin/cgi-bin/wgetz?-sf+embl+-e+[{EMBLNEW_SP_EMBL}-acc:X65923|{EMBLNEW_SP_EMBL}-id:X65923]

pulls out the sequence OK.

i.e.  it only works if it searches for an id first and finds a match to
an ID or it searches for a match to an AccNumber first and finds a match
to an AccNumber

http://srs5.hgmp.mrc.ac.uk/srs5bin/cgi-bin/wgetz?-sf+embl+-e+[{EMBLNEW_SP_EMBL}-id:AC003034|{EMBLNEW_SP_EMBL}-acc:AC003034]

pulls out the sequence twice because AC003034 is in both EMBL and
EMBLNEW and it pulls out the two versions. 

Can anyone suggest an alternative to doing multiple SRS searches,
stopping when a sequence has been found?

I'm sure it ought to be possible; I must confess a woeful ignorance of
SRS syntax. 


Gary Williams                                     Tel: +44 1223 494522
mailto:G.Williams at hgmp.mrc.ac.uk            http://www.hgmp.mrc.ac.uk/
Bioinformatics,MRC HGMP Resource Centre,Hinxton,Cambridge, CB10 1SB,UK





More information about the Bio-srs mailing list

Send comments to us at biosci-help [At] net.bio.net