Christian Bilien's Oracle performance and tuning blog

Join the BAARF Party (or not) (2/2)


The mere fact that full stripe aggregation can be done by modern storage arrays (as claimed by vendors) removes the RAID 5 overhead, making RAID10 less attractive (or not attractive at all), thereby dramatically reducing storage costs.

Is it true ?

Before jumping on the stripe aggregation tests, you may find it useful to read the first post I wrote on striping and RAID.

I tried to prove on pure sequential writes on an EMC cx600 that full stripe aggregation exists for small stripes (5*64K=360K) on a Raid 5 5+1 (i.e. almost no small write penalty has to be paid). However, once you increase the stripe size (up to a RAID 5 10+1), the full stripe aggregation just vanished.

Test conditions:

The Navisphere Performance Analyzer is used for measuring disks throughputs. It does not provide any metric that show whether full stripe aggregation is performed or not. So I just generated write bursts, and did some maths (I’ll develop this in another blog entry later on this), based on the expected reads generated by raid 5 writes.

Number of reads generated by writes on a RAID 5 4+1 array, where:

n· : number of stripe units modified by a write request

r : Number of stripe units read as a result of a request to write stripe units.






Read one stripe unit and the parity block



Read two additional stripe units to compute the parity



Read one more stripe units to compute the parity



No additional reads needed

You can compute that way the number of reads (r) as a function of the number of stripes (n).

<!–[if !vml]–>The stripe unit is always 64k, the number of columns in the raid group will be 5+1 in test 1, 11+1 in test 2.

Test 1:

Small stripes (360K) on a Raid 5 5+1

I/O generator: 64 write/s for and average throughput of 4050kB/s. The average I/O size is therefore 64Kb/s. No read occurs. The operating system buffer cache has been disabled.

Assuming write aggregation, writes should be performed as a 320KB (full stripe) unit. Knowing the Operating System I/O rate (64/s), and assuming that we aggregate 5 OS I/O in one, we can calculate that the Raid Group I/O rate should be 64/5 = 12,8I/O/s. Analyzer gives for a particular disk an average throughput of 13,1. Write aggregation also means that no reads should be generated. The analyzer reports 2,6 reads/s on average. Although very low, this shows that write aggregation may not always be possible depending of particular cache situations.

Test 2:

Large stripes (768K)

82MB are sequentially written every 10s in bursts.

1. Write throughput at the disk level is 12,6Write/s, read throughput=10,1/s. Knowing the no reads are being sent from the OS, those figures alone show that full stripe write aggregation cannot be done by the Clariion.

2. However, some aggregation (above 5 stripes per write) is done as read throughput would otherwise be even higher: no aggregation would mean an extra 25 extra reads per seconds. The small write penalty does not greatly vary anyway when the number of stripes per write is above 4.