Peter Rice wrote:
>> I spent a while this morning trying to track down entry
> '[TREMBLNEW-ID:AAD17483]' which disappeared when I updated
> SPTREMBL and TREMBLNEW.
>> It is now in SPTREMBL, and I found it eventually as
> '[SPTREMBL-ACC:Q9Z6I5]'
>> What I would really like to do is find it by using the TREMBLNEW id
> which should appear as a Protein-ID (prd) in the protein databases and EMBL.
>> Sadly, there is by default no Protein-ID index for swissprot,
> swissnew, sptrembl or tremblnew. Is anyone out there indexing
> Protein-IDs for the protein databases? It is there hiding in the DR
> lines. If there is no 'standard' way I can invent something.
>> SRS5 (as EMBL is indexed on the EBI's FTP server) does index
> /protein_id in the feature table, but SRS6 only does /db_xref which is
> the obsolete pid not used in TREMBLNEW - so I can find it by
> '[embl-prd:AAD17483*]' (that "*" is a nuisance and very confusing to
> users - just because of the ".n" after - I would prefer to index just
> the prefix because you only ever get the 'latest' version of the
> protien in the database).
>> Curiously, DATABANKS at EBI does not seem to include the EMBL feature
> fields in its index.
>
OK - it seems to make sense to index protein ids.
This is how you can do it in SRS6:
To index protein_id in embl:
Define a ProtID field in srsgen.i:
$DF_ProtID=$SrsField:[ProteinID short:prd]
In embl.is, add the production:
ftprd: ~ {$In:[ftvals c:qual] $Out}
/\/protein_id="([0-9a-zA-Z]+)/ {$Wrt:[s:$1]} ~
(if you want to include the version number in the value to be indexed, use
"/\/protein_id="([0-9a-zA-Z\.]+)/" as regular expression in the production)
In embl.i, add the field to be indexed to $EmblFeature_Format:
$Field:[$DF_ProtID token:ft index:str indexToken:ftpid tableToken:ftpid]
To index the protein_id field in swissprot/sptrembl, add this production
to swissprot.is:
protid: ~ {$In:[fields c:link] $Out} (tag
(/EMBL;[^;]+; +([A-Z0-9]+)/ {$Wrt:[s:$1]} | ln )*)* ~
and add to the $SWISSPROT_FORMAT in swissprot.i the field:
$Field:[$DF_ProtID index:str code:link indextoken:protid]
And finally - to introduce hyperlinks:
add a new hyperlink definition to href.i:
prdR:$Href:[$EMBL_DB field:$DF_ProtID]
edit swissprot.is - look for the occurenceof pidR, and replace it by prdR
We will incorporate these changes in the next release of SRS6
Martin Hilbers
--
-----------------------------------------------------
Martin Hilbers Customer Support Specialist
LION Bioscience Main: +44 (0) 1223 224 700
Sheraton House Phone: +44 (0) 1223 224 711
Castle Park Fax: +44 (0) 1223 224 701
Cambridge CB3 0AX
UNITED KINGDOM Email:hilbers at lionbio.co.uk
-----------------------------------------------------