IUBio

[Bio-srs] Uniprot and EMBL question

Hamish McWilliam hpm at ebi.ac.uk
Mon Oct 10 09:38:59 EST 2005


Hi Iain,

> I am trying to return the embl entries for a list of uniprot entries.
> I use the following command.
> getz '(@testing > embl)'
> where the file testing contains:
> uniprot:CYGB_MOUSE
> uniprot:GLB1_SCAIN
> 
> The output is:
> EMBL:AK019410
> EMBL:MMU315163
> EMBL:BC055040
> 
> Is there any way of viewing the Uniprot ID's aswell as the EMBL ID;
> My ideal output would be
> EMBL:AK019410    UNIPROT:CYGB_MOUSE
> EMBL:MMU315163 UNIPROT:CYGB_MOUSE
> EMBL:BC055040     UNIPROT:CYGB_MOUSE
> 
> I have tried getz '(@testing > embl) > uniprot'
> but this only returns one entry, rather than three..
> 
> I want to parse out the results into individual files according to the
> uniprot id.
> 
> I believe it is possible using views and wgetz, but I would prefer not
> to use wgetz

A simple solution is to use a shell script to do the relevant 
processing. For example:

   #!/bin/sh
   tab=`echo "\t"`
   for ln in `cat testing`;  do
     getz "[$ln]>embl" | sed "s#\$#$tab$ln#"
   done

This produces your desired result, but is inefficent for large lists of 
ids since each id is processed using an individual getz call.

If your set of ids is the product of a query you could use an Icarus 
script to do the processing instead, and avoid some of the overhead 
involved in the getz calls.

Hamish
-- 
============================================================
Mr Hamish McWilliam
European Bioinformatics Institute
Wellcome Trust Genome Campus
Hinxton, Cambridge, CB10 1SD, UK

URL: http://www.ebi.ac.uk/
============================================================



More information about the Bio-srs mailing list

Send comments to us at biosci-help [At] net.bio.net