IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

Protein IDs

Martin Hilbers hilbers at lionbio.co.uk
Fri Jul 16 08:31:58 EST 1999

Peter Rice wrote:
> I spent a while this morning trying to track down entry
> '[TREMBLNEW-ID:AAD17483]' which disappeared when I updated
> It is now in SPTREMBL, and I found it eventually as
> What I would really like to do is find it by using the TREMBLNEW id
> which should appear as a Protein-ID (prd) in the protein databases and EMBL.
> Sadly, there is by default no Protein-ID index for swissprot,
> swissnew, sptrembl or tremblnew. Is anyone out there indexing
> Protein-IDs for the protein databases? It is there hiding in the DR
> lines. If there is no 'standard' way I can invent something.
> SRS5 (as EMBL is indexed on the EBI's FTP server) does index
> /protein_id in the feature table, but SRS6 only does /db_xref which is
> the obsolete pid not used in TREMBLNEW - so I can find it by
> '[embl-prd:AAD17483*]' (that "*" is a nuisance and very confusing to
> users - just because of the ".n" after - I would prefer to index just
> the prefix because you only ever get the 'latest' version of the
> protien in the database).
> Curiously, DATABANKS at EBI does not seem to include the EMBL feature
> fields in its index.

OK - it seems to make sense to index protein ids.
This is how you can do it in SRS6:

To index protein_id in embl:

Define a ProtID  field in srsgen.i:

  $DF_ProtID=$SrsField:[ProteinID short:prd]

In embl.is, add the production:

  ftprd:     ~ {$In:[ftvals c:qual] $Out} 
               /\/protein_id="([0-9a-zA-Z]+)/ {$Wrt:[s:$1]} ~ 

(if you want to include the version number in the value to be indexed, use 
"/\/protein_id="([0-9a-zA-Z\.]+)/"  as regular expression in the production)

In embl.i, add the field to be indexed to $EmblFeature_Format:

  $Field:[$DF_ProtID token:ft index:str indexToken:ftpid tableToken:ftpid]

To index the protein_id field in swissprot/sptrembl, add this production
to swissprot.is:

  protid:    ~ {$In:[fields c:link] $Out} (tag 
		 (/EMBL;[^;]+; +([A-Z0-9]+)/ {$Wrt:[s:$1]} | ln )*)* ~

and add to the $SWISSPROT_FORMAT in swissprot.i the field:

  $Field:[$DF_ProtID index:str code:link indextoken:protid]

And finally - to introduce hyperlinks:

add a new hyperlink definition to href.i:

  prdR:$Href:[$EMBL_DB field:$DF_ProtID]

edit swissprot.is - look for the occurenceof pidR, and replace it by prdR

We will incorporate these changes in the next release of SRS6

Martin Hilbers

Martin Hilbers            Customer Support Specialist
LION Bioscience           Main:  +44 (0) 1223 224 700
Sheraton House            Phone: +44 (0) 1223 224 711
Castle Park               Fax:   +44 (0) 1223 224 701
Cambridge CB3 0AX        
UNITED KINGDOM            Email:hilbers at lionbio.co.uk

More information about the Bio-srs mailing list

Send comments to us at biosci-help [At] net.bio.net