IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

Protein IDs

Martin Hilbers hilbers at lionbio.co.uk
Fri Jul 16 08:31:58 EST 1999



Peter Rice wrote:
> 
> I spent a while this morning trying to track down entry
> '[TREMBLNEW-ID:AAD17483]' which disappeared when I updated
> SPTREMBL and TREMBLNEW.
> 
> It is now in SPTREMBL, and I found it eventually as
> '[SPTREMBL-ACC:Q9Z6I5]'
> 
> What I would really like to do is find it by using the TREMBLNEW id
> which should appear as a Protein-ID (prd) in the protein databases and EMBL.
> 
> Sadly, there is by default no Protein-ID index for swissprot,
> swissnew, sptrembl or tremblnew. Is anyone out there indexing
> Protein-IDs for the protein databases? It is there hiding in the DR
> lines. If there is no 'standard' way I can invent something.
> 
> SRS5 (as EMBL is indexed on the EBI's FTP server) does index
> /protein_id in the feature table, but SRS6 only does /db_xref which is
> the obsolete pid not used in TREMBLNEW - so I can find it by
> '[embl-prd:AAD17483*]' (that "*" is a nuisance and very confusing to
> users - just because of the ".n" after - I would prefer to index just
> the prefix because you only ever get the 'latest' version of the
> protien in the database).
> 
> Curiously, DATABANKS at EBI does not seem to include the EMBL feature
> fields in its index.
> 


OK - it seems to make sense to index protein ids.
This is how you can do it in SRS6:

To index protein_id in embl:

Define a ProtID  field in srsgen.i:

  $DF_ProtID=$SrsField:[ProteinID short:prd]

In embl.is, add the production:

  ftprd:     ~ {$In:[ftvals c:qual] $Out} 
               /\/protein_id="([0-9a-zA-Z]+)/ {$Wrt:[s:$1]} ~ 

(if you want to include the version number in the value to be indexed, use 
"/\/protein_id="([0-9a-zA-Z\.]+)/"  as regular expression in the production)


In embl.i, add the field to be indexed to $EmblFeature_Format:

  $Field:[$DF_ProtID token:ft index:str indexToken:ftpid tableToken:ftpid]


To index the protein_id field in swissprot/sptrembl, add this production
to swissprot.is:

  protid:    ~ {$In:[fields c:link] $Out} (tag 
		 (/EMBL;[^;]+; +([A-Z0-9]+)/ {$Wrt:[s:$1]} | ln )*)* ~

and add to the $SWISSPROT_FORMAT in swissprot.i the field:

  $Field:[$DF_ProtID index:str code:link indextoken:protid]


And finally - to introduce hyperlinks:

add a new hyperlink definition to href.i:

  prdR:$Href:[$EMBL_DB field:$DF_ProtID]

edit swissprot.is - look for the occurenceof pidR, and replace it by prdR



We will incorporate these changes in the next release of SRS6

Martin Hilbers

-- 
-----------------------------------------------------
Martin Hilbers            Customer Support Specialist
LION Bioscience           Main:  +44 (0) 1223 224 700
Sheraton House            Phone: +44 (0) 1223 224 711
Castle Park               Fax:   +44 (0) 1223 224 701
Cambridge CB3 0AX        
UNITED KINGDOM            Email:hilbers at lionbio.co.uk
-----------------------------------------------------




More information about the Bio-srs mailing list

Send comments to us at biosci-help [At] net.bio.net