Peter Rice wrote:
>> I spent a while this morning trying to track down entry
> '[TREMBLNEW-ID:AAD17483]' which disappeared when I updated
>> It is now in SPTREMBL, and I found it eventually as
>> What I would really like to do is find it by using the TREMBLNEW id
> which should appear as a Protein-ID (prd) in the protein databases and EMBL.
>> Sadly, there is by default no Protein-ID index for swissprot,
> swissnew, sptrembl or tremblnew. Is anyone out there indexing
> Protein-IDs for the protein databases? It is there hiding in the DR
> lines. If there is no 'standard' way I can invent something.
>> SRS5 (as EMBL is indexed on the EBI's FTP server) does index
> /protein_id in the feature table, but SRS6 only does /db_xref which is
> the obsolete pid not used in TREMBLNEW - so I can find it by
> '[embl-prd:AAD17483*]' (that "*" is a nuisance and very confusing to
> users - just because of the ".n" after - I would prefer to index just
> the prefix because you only ever get the 'latest' version of the
> protien in the database).
>> Curiously, DATABANKS at EBI does not seem to include the EMBL feature
> fields in its index.
OK - it seems to make sense to index protein ids.
This is how you can do it in SRS6:
To index protein_id in embl:
Define a ProtID field in srsgen.i:
$DF_ProtID=$SrsField:[ProteinID short:prd]
In, add the production:
ftprd: ~ {$In:[ftvals c:qual] $Out}
/\/protein_id="([0-9a-zA-Z]+)/ {$Wrt:[s:$1]} ~
(if you want to include the version number in the value to be indexed, use
"/\/protein_id="([0-9a-zA-Z\.]+)/" as regular expression in the production)
In embl.i, add the field to be indexed to $EmblFeature_Format:
$Field:[$DF_ProtID token:ft index:str indexToken:ftpid tableToken:ftpid]
To index the protein_id field in swissprot/sptrembl, add this production
protid: ~ {$In:[fields c:link] $Out} (tag
(/EMBL;[^;]+; +([A-Z0-9]+)/ {$Wrt:[s:$1]} | ln )*)* ~
and add to the $SWISSPROT_FORMAT in swissprot.i the field:
$Field:[$DF_ProtID index:str code:link indextoken:protid]
And finally - to introduce hyperlinks:
add a new hyperlink definition to href.i:
prdR:$Href:[$EMBL_DB field:$DF_ProtID]
edit - look for the occurenceof pidR, and replace it by prdR
We will incorporate these changes in the next release of SRS6
Martin Hilbers
Martin Hilbers Customer Support Specialist
LION Bioscience Main: +44 (0) 1223 224 700
Sheraton House Phone: +44 (0) 1223 224 711
Castle Park Fax: +44 (0) 1223 224 701
Cambridge CB3 0AX
UNITED KINGDOM Email:hilbers at