In article <1992Oct27.072456.13689 at nic.funet.fi>, harper at convex.csc.FI (Rob Harper) writes:
|>|> So I would like to ask how people feel about mirroring of resources.
|> Is it enough that everyone in Europe pounds Don's GenBank resource.
|> Is it enough that everyone in USA pounds Reinhard EMBL resource...
|> and everyone in the world jumps on Dan's PIR resource, and when
|> these centres run into service glitches then nobody has anywhere to
|> go. Do we need duplication??? What should we duplicate??? The BIG
|> databases or the smaller ones like ace and aat??? Who is going to
|> have the disk space to provide the service???
|>
Good point. Let me tell you that the indices of GOPHER grow as fast as the
databases do. Because GOPHER is free, people think that it is cheap.
Well, nearly... just the EMBL databases alone make about 50MByte:
4490 /bioy/gopher-data/index/embl/fun
9925 /bioy/gopher-data/index/embl/inv
4038 /bioy/gopher-data/index/embl/mam
3384 /bioy/gopher-data/index/embl/org
1134 /bioy/gopher-data/index/embl/phg
5242 /bioy/gopher-data/index/embl/pln
22777 /bioy/gopher-data/index/embl/pri
12398 /bioy/gopher-data/index/embl/pro
17139 /bioy/gopher-data/index/embl/rod
1827 /bioy/gopher-data/index/embl/syn
3232 /bioy/gopher-data/index/embl/una
9898 /bioy/gopher-data/index/embl/vrl
4705 /bioy/gopher-data/index/embl/vrt
100191 /bioy/gopher-data/index/embl
Plus,
18858 /bioy/gopher-data/index/xembl
14041 /bioy/gopher-data/index/xxembl
the daily updates. As mentioned earlier, hosts or resources which are 'down'
are frustrating. Therefore, during the daily update procedure, I run the
databases DOUBLE and after crosscheck I rename the paths. Getting me currently
a 100 MB allocation for EMBL database indexing alone. If you gonna tell that
this is little, I agree but remind you that the BLAST and GCG formatted
database also take their share. Not to count the resources to have the
CD Roms mounted (currently 3 drives at the bioftp server).
In total, including scratch and redundant archive space, I currently allocate
5-6 Gigabytes of Disk on several computer systems for providing GCG,
BLAST, and GOPHER. We anticipate a doubling rate of 16 months for EMBL.
Its time to think about who doubles what. Certainly, the small sites
will loose just from material standpoint. I worry at the time whether I
become a small site soon or later, because our funding agencies might not like
to throw money into a center which serves the world rather than the own
guys exclusively.
|> 2)Lost in gopher-space... How many people have had this experiance.
|> You read about a resource. You track it down. You negotiate all
|> the menus, and finally you reach your destination. You fail to make
|> a bookmark, and the next time you try to navigate to the same place
|> you can never find it. What can we do to make moving about in
|> gopher-space a "memorable" experiance. Do we need a standard bio-gopher
|> interface (/databases, /software, /hints) that looks the same on
|> every machine, or do we need weird flashing neon lights and steam
|> whistles (Desperately Seeking Suzan) to provide hooks that jolt
|> our memories into remembering "hey I've been in this place before".
|> I have gone in the latter direction, renaming "Name=" to something
|> more graphic and descriptive that I have found in the original link.
|>
Bookmarks are fine toys to keep this up and regulated. It certainly
is needed to organize a 'who has what' database. This should, then, be
searchable via a fanout mindexer like I already use it now for the
EMBL subsections. I think we should have a index type gopher
sitting at some (agreed) free port and be adressable by such a 'who has what'
gopher. The problem is updating, though. It would be cute to have
a archie type of polling for a .resources file which then could be maintained
automatically. Even automatic sorting would be possible.
Next, I would appreciate to hear who has what and who has links. Quite a
few holes around the globe have an 'About Gopher' item locally and the
rest are links. Whereas this seems to be most comfortable, it hides the
cost these poor guys out there have to provide services.
Classification of services is also an issue. It helps little if someone
puts up a server and doen't update it regularly. From own experience
I know that some activities are short-living at best. However, services
like Don's Genbank etc. become sort of standard in the habits of some of
us, and browsing through the accounting logs I find that about 80% of the
queries come from 10% of the nodes. I would appreciate if these nodes think
about having their own resources set up, and/or made these available to
the public, under the control of established data sets to make sure that
DUPLICATION and not imitation is the goal. Nothing is worse than a bad
copy!
In summary, not a bad idea to have a schema. However, the manpower needed
to set it up (AND MAINTAIN!!!) is more than I can currently get as
funding for such a project. The worst thing in all that is that the
individual sites are severely depending on local funds, thus being very much
restricted in setting up 'global' services. I currently don't block any
access but from last month to now we had several gigabytes coming off the
bioftp gopher/ftp/hassle system, and this is causing some people to
think of future access restrictions. Again, I currently have no intention
to block access, but in the future service providers like us will need
to think of accounting.
Accounting for networked access, for example, would include to have
warnings sent out to hardcore users. If these are from non-provider sites,
these might need to contribute funds, or get blocked after certain
volumes (lets say, after 250 MBytes). You are right saying that
there are a lot of GOPHER questions to be asked but some of our
non-swiss FTP customers rolled out the EMBL CD en block. This is
definitively not the intention of providing ftp servers! But also GOPHER
starts to go bananas occasionally: Now as I see that there are apparently
two sites who have set up automatic gopherized queries via whatever script
(same time of the day, same questions asked on a DAILY basis) you
will understand that I start worrying as service provider...
Last, let me renmind you that the Pisa networking conference starting
next monday is made for such discussions, and I hope to see some of you there.
--
+----------------------------------+-------------------------------------+
| Dr. Reinhard Doelz | RFC doelz at urz.unibas.ch |
| Biocomputing | DECNET 20579::48130::doelz |
|Biozentrum der Universitaet | X25 022846211142036::doelz |
| Klingelbergstrasse 70 | FAX x41 61 261- 6760 or 267- 2078
| CH 4056 Basel | TEL x41 61 267- 2076 or 2247 |
+------------- bioftp.unibas.ch is the SWISS EMBnet node ----------------+
-----------------------------------------