Databank updates

J-L Risler risler at cgmvax.cgm.cnrs-gif.fr
Tue Apr 11 02:15:46 EST 1995


As most of you are aware of, the increase in the EBI or Genbank sizes is 
not a minor problem...
Here I maintain a GCG-formatted version of the EBI databank (I exclude 
the EST division, because ESTs are mostly used in BLAST searches at 
remote sites).
It appears that today, the cumulated weekly updates since the last CD-ROM 
release of EBI is as large as the last release itself....
This is due in part, of course, to the increasing number of newly 
determined sequences. But it is due also to a great number of ESTs and a 
fabulous number of "duplicates" where duplicate means that an entry in 
EBI has been corrected or modified - thus a "duplicate" has the same 
accession number or ID in the full release and the updates.
My question is: do you know of an efficient program which, starting from 
the EBI flat file and the weekly updates flat files, will remove the 
redundancies and keep the last updated one, and possibly remove the ESTs 
from the updates?

Thank you for your help,


PS. I cross-post this message to both bionet.software.gcg and 

Jean-Loup RISLER
risler at cgmvax.cgm.cnrs-gif.fr
Centre de Genetique Moleculaire
91198  Gif sur Yvette  France

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net