In article <DAC5nJ.FIA at murdoch.acc.Virginia.EDU>, wrp at avery.med.Virginia.EDU (Bill Pearson) writes:
>We have been NFS mounting our Genbank and Protein databases and have
>not seen any effect. We support several different analysis packages -
>GCG and Eugene - and I do quite a few searches on my own (sometimes
>hundreds per day). We have never seen an effect on the network -
>certainly nothing approaching saturation.
>
Interesting, since we routinely saturate the nets whenever we FTP stuff
between our two AXPs.
What is your hardware configuration? Do you have a switching hub
or anything special?
Genbank is around 150Mb of sequence data. On our local subnet data moves
at around 700-800 kb/sec between the AXPs via binary FTP. Probably that's
about the same for NFS too. So a full search, if not compute bound, would
take about 190 seconds, or 3 minutes, to move the whole database. If the
client cannot process the data at this rate, then you will load the net
proportionally less.
So the key question is how long does each of these searches take? Ie, what
fraction of the time are you moving data, and what fraction crunching it?
Here's a command that, if run on a fast machine, would seem very likely to
saturate a net if Genbank were NFS mounted:
$ findpatterns/infile=gb:*/pattern=AGCTAGCTAGCTAGCTACGT/default
(ie, a search everything for a simple pattern that isn't found much).
Since you've got this configuration already set up, perhaps you wouldn't
mind doing the experiment? Your choice of measures for how much capacity
remains in the net.
Regards,
David Mathog
mathog at seqvax.bio.caltech.edu
Manager, sequence analysis facility, biology division, Caltech