dear all,
we have built the new TREMBL. You can get it from
felix.embl-heidelberg.de
get the file
pub/databases/trembl/trembl.dat
about indexing embl:
I noticed that the feature index got very large since the parser entered the
/translation= ...into the index to prevent this edit embl.sdl
and change the production 'qualifier' so that it includes 'translate'
and then add the production 'translate'
qualifier = ecnumber | citation | translate | qual;
translate = '/TRANSLATION=' '\"' { ~\"~ <not> } '\"';
here is also a better production 'comment'
comment = { (~A-Za-z0-9~ <not>) | (~A-Za-z0-9~ <new>) };
that will also reduce the size of the index
you can find out what enters the index by
srsbuild -f features embl -d
...of course that is not that useful since EMBL starts with all these ests
once you have the index you can do
getz '[embl-features:*]' -rep ...this prints all the words that match '*'
one last thing about srs5:
I will probably start distributing beta-version in around mid-april to end of april.
The main improvement will be with parsing so that problems as those above are much easier
to find and to fix.
We are aware of the problems indexing large databank as EMBL and Genbank. There is a way out by indexing
the individual indices individually and merging the indices later - I will try to implement
that as soon as possible.
regards
Thure
--
===============================================================================
Thure Etzold | EMBL
E-mail: etzold at embl-heidelberg.de | Postfach 10.2209
Tel: (49) 6221 387529 | 69012 Heidelberg
Fax: (49) 6221 387517 | Germany