IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

indices for TREMBL, EMBL

Thure Etzold etzold at embl-heidelberg.de
Tue Mar 26 03:48:07 EST 1996


dear all,

we have built the new TREMBL. You can get it from
felix.embl-heidelberg.de
get the file
pub/databases/trembl/trembl.dat

about indexing embl:

I noticed that the feature index got very large since the parser entered the
/translation= ...into the index to prevent this edit embl.sdl

and change the production 'qualifier' so that it includes 'translate'
and then add the production 'translate'

  qualifier  =  ecnumber | citation | translate | qual;
  translate  = '/TRANSLATION=' '\"' { ~\"~  <not> } '\"';

here is also a better production 'comment'

 comment    = {  (~A-Za-z0-9~ <not>)  |  (~A-Za-z0-9~ <new>) };

that will also reduce the size of the index


you can find out what enters the index by

srsbuild -f features embl -d 

...of course that is not that useful since EMBL starts with all these ests

once you have the index you can do 

getz '[embl-features:*]' -rep  ...this prints all the words that match '*' 


one last thing about srs5:

I will probably start distributing beta-version in around mid-april to end of april.
The main improvement will be with parsing so that problems as those above are much easier
to find and to fix.

We are aware of the problems indexing large databank as EMBL and Genbank. There is a way out by indexing
the individual indices individually and merging the indices later - I will try to implement
that as soon as possible.

regards
Thure


-- 
===============================================================================
Thure Etzold                                   | EMBL
E-mail: etzold at embl-heidelberg.de              | Postfach 10.2209
Tel: (49) 6221 387529                          | 69012 Heidelberg
Fax: (49) 6221 387517                          | Germany




More information about the Bio-srs mailing list

Send comments to us at biosci-help [At] net.bio.net