Thure Etzold wrote:
>> > Dear SRS developers and fellow SRS server managers,
> >
> >While trying to implement a direct link between PIR and EMBL/GENBANK I
> >ran into trouble. After some searching I found that the cause was a bug
> >in the SRS software itself. It seems that when there is e.g. an Icarus
> >line :
> >$link:[@SWISSPROT_DB to:@?EMBL_DB token:'link|EMBL' toField:@DF_Accession]
> >not just tokens with code EMBL are put in the index, but all tokens.
> >
>> this problem has been reported before and i had problems reproducing it. it
> seems that it does occur rather infrequently and not as a rule.
>> >To convince yourself of this, try the following :
> >search in SWISSPROT the entry with ID HA12_MOUSE
> >then make the link to EMBL
> >you will see that besides the two correct EMBL entries you also find
> >A02201 which is not related to the sequence SWISSPROT:HA12_MOUSE but
> >happens to have the same accession number as the corresponding PIR
> >entry.
> >
> >SRS is great, but keeping it alive costs a lot of sweat and tears...
>> very sorry about that but your error report help a lot!
>> regards
> thure
It may have something to do with the "-s unix" command line option
used when you do a srsbuild via srscheck. Have a look at this:
# srsbuild swissprot -l -nn
...reading links to "GENBANK"
...reading links to "EMBL"
...reading links to "PIR"
...reading links to "PDB"
...reading links to "OMIM"
...processing /usr/local/gcg/data/gcgswissprot/swissprot.ref
...processing /usr/local/gcg/data/gcgswissprot/swissprot.seq
...wrote link from "SWISSPROT" to "GENBANK"
valid references: 109413, invalid references: 158,
total number of links: 112514
...wrote link from "SWISSPROT" to "EMBL"
valid references: 109396, invalid references: 175,
total number of links: 112116
...wrote link from "SWISSPROT" to "PIR"
valid references: 47054, invalid references: 75,
total number of links: 39124
...wrote link from "SWISSPROT" to "PDB"
valid references: 5753, invalid references: 51,
total number of links: 5753
...wrote link from "SWISSPROT" to "OMIM"
valid references: 3760, invalid references: 0,
total number of links: 3760
...program "srsbuild" completed successfully.
looks good. Test it:
# getz '[swissprot-id:HA12_MOUSE]>embl'
EMBL:MM190
EMBL:MMU47326
now with "-s unix":
# srsbuild swissprot -l -nn -s unix
...reading links to "GENBANK"
...reading links to "EMBL"
...reading links to "PIR"
...reading links to "PDB"
...reading links to "REBASE"
...reading links to "OMIM"
...processing /usr/local/gcg/data/gcgswissprot/swissprot.ref
...processing /usr/local/gcg/data/gcgswissprot/swissprot.seq
..wrote link from "SWISSPROT" to "GENBANK"
valid references: 125252, invalid references: 123397,
total number of links: 128143
...wrote link from "SWISSPROT" to "EMBL"
valid references: 125155, invalid references: 123494,
total number of links: 127675
...wrote link from "SWISSPROT" to "PIR"
valid references: 49253, invalid references: 199396,
total number of links: 41352
...wrote link from "SWISSPROT" to "PDB"
valid references: 5753, invalid references: 242896,
total number of links: 5753
...wrote link from "SWISSPROT" to "OMIM"
valid references: 3766, invalid references: 244883,
total number of links: 3766
...program "srsbuild" completed successfully.
Now the system tries to match all 250.000 links to all databases,
and a test:
# getz '[swissprot-id:HA12_MOUSE]>embl'
EMBL:A02201
EMBL:MM190
EMBL:MMU47326
pulls out A02201 - which should be a link to PIR
Cheers,
Martin
--
-------------------------------------------------------------------
| Martin Hilbers http://www.dci.clrc.ac.uk/Person.asp?m.p.hilbers |
| SEQNET | E-mail: m.p.hilbers at dl.ac.uk |
| Daresbury Laboratory | Tel: +44-1925-603492 |
| Daresbury, Warrington | Fax: +44-1925-603100 |
| Cheshire WA4 4AD | SEQNET is the UK national EMBNet node |
| United Kingdom | http://www.seqnet.dl.ac.uk/ |
-------------------------------------------------------------------