Christian Bilien’s Oracle performance and tuning blog

February 11, 2007

Join the BAARF Party (or not) (2/2)

Filed under: Oracle,Storage — christianbilien @ 9:33 pm

The mere fact that full stripe aggregation can be done by modern storage arrays (as claimed by vendors) removes the RAID 5 overhead, making RAID10 less attractive (or not attractive at all), thereby dramatically reducing storage costs.

Is it true ?

Before jumping on the stripe aggregation tests, you may find it useful to read the first post I wrote on striping and RAID.

I tried to prove on pure sequential writes on an EMC cx600 that full stripe aggregation exists for small stripes (5*64K=360K) on a Raid 5 5+1 (i.e. almost no small write penalty has to be paid). However, once you increase the stripe size (up to a RAID 5 10+1), the full stripe aggregation just vanished.

Test conditions:

The Navisphere Performance Analyzer is used for measuring disks throughputs. It does not provide any metric that show whether full stripe aggregation is performed or not. So I just generated write bursts, and did some maths (I’ll develop this in another blog entry later on this), based on the expected reads generated by raid 5 writes.

Number of reads generated by writes on a RAID 5 4+1 array, where:

n· : number of stripe units modified by a write request

r : Number of stripe units read as a result of a request to write stripe units.

n

r

Explanation

1

2

Read one stripe unit and the parity block

2

2

Read two additional stripe units to compute the parity

3

1

Read one more stripe units to compute the parity

4

0

No additional reads needed

You can compute that way the number of reads (r) as a function of the number of stripes (n).

<!–[if !vml]–>The stripe unit is always 64k, the number of columns in the raid group will be 5+1 in test 1, 11+1 in test 2.

Test 1:

Small stripes (360K) on a Raid 5 5+1

I/O generator: 64 write/s for and average throughput of 4050kB/s. The average I/O size is therefore 64Kb/s. No read occurs. The operating system buffer cache has been disabled.

Assuming write aggregation, writes should be performed as a 320KB (full stripe) unit. Knowing the Operating System I/O rate (64/s), and assuming that we aggregate 5 OS I/O in one, we can calculate that the Raid Group I/O rate should be 64/5 = 12,8I/O/s. Analyzer gives for a particular disk an average throughput of 13,1. Write aggregation also means that no reads should be generated. The analyzer reports 2,6 reads/s on average. Although very low, this shows that write aggregation may not always be possible depending of particular cache situations.

Test 2:

Large stripes (768K)

82MB are sequentially written every 10s in bursts.

1. Write throughput at the disk level is 12,6Write/s, read throughput=10,1/s. Knowing the no reads are being sent from the OS, those figures alone show that full stripe write aggregation cannot be done by the Clariion.

2. However, some aggregation (above 5 stripes per write) is done as read throughput would otherwise be even higher: no aggregation would mean an extra 25 extra reads per seconds. The small write penalty does not greatly vary anyway when the number of stripes per write is above 4.

1 Comment »

  1. […] So is Raid 5 really so bad ? I’ll start with generalities, and I’ll put in a second post of my […]

    Pingback by Join the BAARF Party (or not) (1/2) « Christian Bilien’s Oracle performance and tuning blog — April 21, 2007 @ 7:07 pm


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: