Christian Bilien’s Oracle performance and tuning blog

February 10, 2007

Storage array cache sizing

Filed under: Oracle,Storage — christianbilien @ 11:28 am

I often have to argue or even fight with storage providers about cache sizing. Here is a set of rules I apply in Proof Of Concepts and disk I/O modellization.

Write cache size:

1. Cache sizing : the real issue is not the cache size, but how fast the cache can flush to disk. In other words, assuming sustained IO, cache will fill and I/O will bottleneck is this condition is met : the rate of incoming I/O is greater than the cache ability to flush.

2. The size of a write cache matters when the array must handle a burst of write I/O. A larger cache can absorb bigger bursts of write, such as database checkpoints. The burst can then be contained without hitting forced flush.

3. Write caching from one SP to the other is normally activated for redundancy and single point of failure removal. That is, the write cache in each SP contains both the primary cached data for the logical units it owns as well as a secondary copy of cache data for the LUNs owned by its peer SP. In other words, SP1’s write cache hold a copy of SP2’s write cache and vice versa. Overall, the real write cache size (seen from a performance point of view) is half the write cache configuration.

4. Write caching is used for raid 5 full stripe agregation (when the storage firmware support this feature) and parity calculation, a very useful feature for many large environments. Lack of space must not force the cache to destage.

Read cache size:

1. Randoms reads are little chance to be in cache: the storage array read cache is unlikely to provide much value on top of the SGA and possibly the file system buffer cache (depending of FILESYSTEMIO_OPTIONS and file system direct IO mount options).

2. However, array read caches are very useful for prefetches. I can see three cases for this situation to occur :

  • Oracle direct reads (or direct path reads) are operating system asynchronous reads (I’ll write a new post on this later). Assuming a well designed backend, you will be able to use a large disk bandwidth to take advantage of large numbers of parallel disk I/O.
  • Buffered reads (imports for example) will take advantage both of file system and storage array read ahead operations.

Also take a look at http://www.oracle.com/technology/deploy/availability/pdf/oow2000_same.pdf

5 Comments »

  1. We have been running into several issues with the write cache filling up and several hosts experiencing large io waits due to a forced flush. Due to my general lack of storage knowledge and general lack of dbms knowledge in our storage group, we seem to be doing a complicated dance around what I think should be a relatively simple resolution. But maybe it is complicated. *grin*

    For instance, we try to maximize the speed of cold backups by using massive parallel scripts (one for each datafile, which totals around 200). Inevitably, we saturate the write cache with dirty buffers and they all have to be flushed (forced). Obviously, the sysadmin folks can turn off the write cache to avoid the forced flush, but that slows down IO for all connected hosts.

    Is there a happy medium somewhere? Yes, we could scale back how many parallel processes run at once. But how much? The number of optimal parallel processes will never be the same since the SAN is shared. Apparently, we have a slightly older CX (700) and we do not have tools to dynamically cap io requests. I believe we already switched over to fibre channel, but I am double-checking that right now. If you would like to see more numbers, just ask; I do not have a solid grasp on what disk/configuration statistics would be critical.

    Any and all advice would be appreciated. I am trying to get my head around your other related posts as well. I have a lot to learn. =)

    Comment by Charles Schultz — January 11, 2008 @ 2:44 pm

  2. Hi Charles,

    What you are experiencing looks to me like the “classic” IO priority issue you encounter on the entry level and mid-range boxes. The DMX has a concept of priority queues, which tremendously enhances I/O response time under heavy loads and prevents starvation. I would try the following: either disable lun caching for the backups, which means that they may run slower (but not slower that if the hosts have to wait a lot on forced flushes). Another alternative if you do not want to disable write caching is to ensure that backups emit IO larger than “write aside” (default 512 KB I think). Both actions tend to avoid write cache flooding.

    Cheers

    Christian

    Comment by christianbilien — January 14, 2008 @ 8:07 pm

  3. Here is the part that confuses me. Say we leave the write cache on; it takes about 2.5 hours to backup a 430gb database. If we turn the write cache off, the backup then takes anywhere from 8 to 12 hours. I have been unable to comprehend the difference, and leads me to believe that there is something else going on than just the write cache issues.

    I mentioned fibre-channel in the original comment; we are not using fibre-channel, but stock ATA drives.

    Your thought on “write aside” is interesting; google tells me that it is a way to setup the lun such that large IO requests bypass the cache. However, if we have already proved that the write cache far outperforms no cache, I do not see how that would help other than to dynamically allow other users the benefit of the write cache while keeping large IO out. Perhaps that is the best way to go doing the daytime….

    Food for thought. Thanks much!

    Comment by Charles Schultz — January 16, 2008 @ 2:50 pm

  4. Hi Charles,

    Sorry I’m late on your comment.
    – Backup time: the backup I/O rate is presumably not steady, which means that a non-negligible part of the backup writes can be buffered (8 vs 12 hours)
    – I suggested write aside to avoid cache flooding and causing delays in the other applications.
    – Stock ATA drives: is it a direct attached scsi storage array ? Are they raid 5 ? One of the things you could look at is the disastrous max write rate you’ll get with ATA. I checked a config I know for you and got on a Clariion cx3-80 1MB/s for a RAID 5 5+1 !
    Cheers

    Christian

    Comment by christianbilien — January 29, 2008 @ 10:03 am

  5. professional resume writers union county nj

    Storage array cache sizing | Christian Bilien’s Oracle performance and tuning blog

    Trackback by professional resume writers union county nj — October 31, 2014 @ 1:34 am


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: