We have been adding sequences from patents to EMBL over the last year.
These have a reference of the form:
RN [1]
RP 1-51
RA ;
RT "DNA EXPRESSION SYSTEMS BASED ON ALPHAVIRUSES";
RL Patent number WO9210578-A/26, 25-JUN-1992.
XX
...
Which was not parsed and terefor not searchable.
I corrected the parser in embl.sdl
old:
----
reference = ? ref | {~ ~ <not>};
ref = ~A-Za-z .\-_~ <wrt c=@JNL> ~0-9~ <wrt c=@VOL> ':'
~0-9~ <wrt c=@PP> '-' <app c=@PP> ~0-9~
'(' ~0-9~ <wrt c=@YEAR> ')';
-----
new:
-----
reference = ? patent | ref | {~ ~ <not>};
ref = ~A-Za-z .\-_~ <wrt c=@JNL> ~0-9~ <wrt c=@VOL> ':'
~0-9~ <wrt c=@PP> '-' <app c=@PP> ~0-9~
'(' ~0-9~ <wrt c=@YEAR> ')';
patent = 'PATENT NUMBER' <wrt> ~A-Z0-9\-/~ <wrt> ',' ~0-9A-Z\-~ '.';
-----
This now allows queries like
getz '[embl-ref: patent*]'
getz '[embl-ref: patent*] and [embl-ref: wo921*]'
Similar queries can be performed in our WWW and SRS-FTP interfaces
Jeroen
Jeroen
--
==============================================================
. O . Jeroen Coppieters
. O O o O . Software Support
O O O O *o O O Jeroen.Coppieters at embl-ebi.ac.uk
O O O O( *o )O O (or jecop at ebi.ac.uk)
)O O O O o* O O( ++44 1223 494422
O O O O( o* )O O
)O O O O *o O O( EMBL Outstation EBI
O O O O( *o )O O (European Bioinformatics Institute)
)O @ O O o* O O( Hinxton Hall
O O O( o* )O(' Hinxton
` O( *o O ' Cambridge CB10 1RQ
` O ' UK
http://www.ebi.ac.uk
==============================================================