I think the notion of the distributed node approach is appropriate,
however I believe that ultimately, it will fail if restricted to
current flatfile systems because of the denormalization that this
involves. Again, this is based on our existing experience with just
such nodes. Unlike Medline (I think), at least 30% of our daily
"output" here involves changes (updates, error corrections,
data-representation improvements) to the data. To take an example;
today, with the exception of the RDBMS satellites, in order for us to
propagate the effects of a change in, say, the spelling of one
taxonomic node that in turn effects 1000's of flatfile entries, we must
either redistribute the 1000's of affected entries to relay the single
change, or wait for the next quarterly release for the distribution
sites to catch up. (recent attempts to redistribute the 2700 venter
cDNA sequences because of a single clerical spelling correction caused
havoc at remote flatfile nodes). If you develop flatfile-based
distribution nodes, ultimately, you will only be distributing the
problem not curing it!
I guess what (all) I am saying, without trying to promote any one
specific solution, is that today's technologies are a reasonable
compromise: our on-line systems and our flatfile distribution
mechanisms, both tape image and remote node are working fine, so far,
but we are pushing the envelope and will likely need fresh technology
to move beyond what could be severe downstream limitations of these
mechanisms.
yes, sure we are biased, Dave, but heck, at least we've got the
experience to prove it :-)
Yup, 600Mb is right--I'd give it a year to two at best, before we need
higher densities or jukebox type drives.
--paul