RAID for DNA databases???

micha at amber.biophys.uni-duesseldorf.de micha at amber.biophys.uni-duesseldorf.de
Sat Oct 8 14:17:09 EST 1994

Bo Servenius (Bo.Servenius at wblab.lu.se) wrote:
: Dear Netters!

: I am considering to buy a SPARCstorage Array 100 for my Sun
: Sparcstation10-54
: system. I am running gcg backage and the standard databases as EMBL
: and Swissprot.

: I have no experience of the technic nor the RAID concepts. For
: example how will the different RAID levels influence the performance
: of db I/O for our applications.

Hi Bosse,

some (theoretical) infos about RAID, from a 'hardware discounter' catalogue.
(we are not using RAID, and the currently available disk sizes seem enough
for a while to me, at least for EMBL -400 MB- and SwissProt)
One remark: RAID is designed for data security, not for implementing
infinite disk size!

RAID 0 : disk striping, concecutive blocks are stored on different disks of
the RAID set. Blocks may be different in size from the 512 k standard size,
but small blocks lead to loss of performance on short write requests (not
the standard case for GCG databases) due to the necessary time for locating
the disk to write the data item. Setting the striping factor to > 512 k ensures
that short requests involve one disk alone.
Long read transfers should do better than on one disk, if striping isn't 
implemented by the OS ..
Problem: one disk crashed leaves the whole set unreadable!
RAID 1 : mirroring (2 disk RAID 0 sets), half disk space, max. double 
read speed (expensive for GCG databases).
RAID 2 : striping with ECC disks; read performance like RAID 0, long writes
as well, short writes require reading all disks to recompute the ECC code!
-> normally low striping factor used, big overhead !
RAID 3 : striping using parity disk; performance like RAID 2 unless high 
striping factor used: good long read performance, low write performance.
RAID 4 : in principle RAID 3 with > 512 k striping factor -> better performance
on short writes.
RAID 5 : parity (always necessary and bottleneck) distributed over all disks.
Faster than RAID 4, parity bottleneck eliminated.

For database disks, RAID 4 or 5 with high striping factor should be the
best solution (if host-based striping isn't enough!).

So far the theroretical stuff, maybe someone comes up with experience :-)

	Michael Schmitz

More information about the Bio-soft mailing list

Send comments to us at biosci-help [At] net.bio.net