the last posting in the 'GCG on a PC' thread raised an interesting
hypothesis which I would like to get more input on. The author mentions that,
due to the speed and ease of the NCBI network server, it were easy to
omit local databases and rely on network resources entirely.
We have seriously investigated this earlier and concluded that the
work done by many of the 'casual' users (i.e., type in sequence, search
sequence, retrieve top hit) can indeed be done by networked databases.
However, to the residual 30% of users, who do not stop after having noticed
merely insignificant hits, what happens if (1) you need to search
for subsets in the database, (2) you need _many_ database entries
(i.e., a 100 or 1000) and (3) you do many comparisons, statistical or
evolutionary analysis, and individual work which should be done anyhow
after a reasonable search.
One of my favourites is to use GCG's feature of files of sequence names
in order to group sequences and process these in any other operation.
Unless you have a very sophisticated network system, this can only be
achieved if your database is in the same environment as your process runs
on local resources most of the time. In order to have _this_ achieved
with networks, we needed a much more sophisticated way to communicate
the search set which we want to tackle. I don't think of database
divisions here but of sets of data which do not use the whole length
but rather a short fragment of it .
How would you imagine to run this type of search in a networked environment?
R.Doelz Klingelbergstr.70| Tel. x41 61 267 2247 Fax x41 61 267 2078|
Biocomputing CH 4056 Basel| electronic Mail doelz at ubaclu.unibas.ch|
Biozentrum der Universitaet Basel|-------------- Switzerland ---------------|
<a href=http://beta.embnet.unibas.ch/>EMBnet Switzerland:info at ch.embnet.org</a>