David Mathog (mathog at seqvax.caltech.edu) wrote:
: It would be *really* nice if the folks who maintain and distribute
: databases would make it a bit easier to automate updates. It isn't all
: that hard now, but it seems that every time I go to do a set of updates
: some file has been moved, or changed names, or is .txt.Z when it was just
: txt the previous time, or in one way or another mutated so as to break my
: retrieval software.
[...]
: Comments?
Comment 1:
The proposed stubs try to compensate the weakness of a transaction based
on a poll mechanism without query. Rather than the customer inquires in
detailed fashion, the current schema of FTP requires the FTP site to be
(1) fixed in file names (btw, what do you do with your software if a new
division appears), (2) preprepared for any request, be it day-by-day,
week-by-week, or other, and (3) does not at all tackle the question what
to do with it. The proposed schema does not mention whether the target is
to update (1) formatted (which format? ), (2) unformatted or (3) incremental
data. In particular, the latter is most appropriate at wide area networks
to save badnwith, and raises management problems at the local site.
Comment 2:
The proposed schema seems to imply a resource discovery based on static
listings. These are notoriously difficult to maintain and do not necessarily
offer a quality control issue (see below). Even if this were a possibility,
this implied that all sites referencing each other have the same policy of
'free for all' and do honor the same quality standards.
Comment 3:
The major problem in sequence database updating is that the quality of
an update cannot be judged if you download it as file. Neither date nor
contents are sufficiently characterized in their format. A synchronisation
is required (or at least desirable) which allows to crosscheck the contents
of your local, adapted , formatted copy to the originally present data at
the provider. Versions and dates are nice but insufficient to characterize
a contents in incremental updates.
Comment 4:
The proposed schema requires a considerable amount of coordination in
between providers. The resources for the update buisness are fairly low
as you, and many others, are neither prepared nor willing to pay for the
service you request. I'm not telling you that all researchers ought to be
tapped but as I had to realize recently lots of our customers don't even
know where the data come from, and being asked for funding, the granting
organizations want to bless just these customers with 'self-sustaining'
fees to cover the costs of such a service. In other words, even if the
items 1-3 could be ruled out, money is tough.
Comment 5:
The market is fairly small. In Europe, we have 26 EMBnet nodes mirroring the
EMBL database, and presumably about 100 sites who care about updates. The
recent referee's comment on a paper we submitted to XXXXX with respect to
a new compression system was (quote) "Not many sites update their databases
regularly over the Internet". This might be a guess on the low side but
you will not be able to count on commercial providers to help you developing
such a mechanism easily.
Comment 6:
We are aware of all these problems. We deal with them on a daily basis as
we update all sequence databases available on daily, or weekly schedule.
We have developed suitable mechanisms to receive, distribute and redistribute
updates. The reason why we didn't make it a 'release' yet is the resource
issue - it works for us, and not necessarily requires that we make the
methods publically available. I am currently finishing HASSLE with SRS
access but afterwards will try to work on the DBTOOLS for release in early
summer (sorry it was originally scheduled fall '94 but postponed). This
should be symptomatic as it shows that once it works at the local site any
effort to make it used on a wider basis in not profitable for the individual
site and therefore usually not practiced heavily.
Regards
Reinhard Doelz
EMBnet Switzerland
--
R.Doelz Klingelbergstr.70| Tel. x41 61 267 2247 Fax x41 61 267 2078|
Biocomputing CH 4056 Basel| electronic Mail doelz at ubaclu.unibas.ch|
Biozentrum der Universitaet Basel|-------------- Switzerland ---------------|
<a href=http://beta.embnet.unibas.ch/>EMBnet Switzerland:info at ch.embnet.org</a>