Christian Bilien’s Oracle performance and tuning blog

February 12, 2008

The “log file sync” wait event is not always spent waiting for an I/O

Filed under: Oracle,Storage — christianbilien @ 4:41 pm

I went through a rather common problem some times ago, which shows how one can get easily fooled by the “log file sync” wait event. Here is a log file sync wait histogram which was interpreted as a poor storage array response time by the DBAs, often a good way to get off the hook of a performance problem:

duration_of_wait_1.jpg

This looks like a Gaussian distribution with a mean value of maybe 30ms. The same event recorded some weeks earlier shows a mean value located between 8 and 16 ms.

duration_of_wait_2.jpg

Each log file sync may wait up to an average of 101ms over an hour:

Event                        Avg wait Time (ms)
------------------------------------------
log file sync                       101

The storage array write destination is a rather loaded EMC DMX1000, and the multiplexed log files are SRDF mirrored. SRDF is the EMC protocol which replicates I/Os from one array to another one. The replication is synchronous here, meaning the remote cache array has to acknowledge each write to the local one before returning a completion status to the LGWR. The SRDF write time must be added to the average write time to reflect the array response time. Some overhead also exists in the OS layer and possibly at the SAN level. Both redo log groups have similar write access patterns. The SRDF write time is close to 30ms over this period of time, a very bad figure. The average write time on the array is 4ms.

This is where a bit of dig in may be interesting:

All calls are emitted from a single program which basically does some small transactions and commits. We’ll consider that no other conditions may trigger a write into the redo log other than a commit.

The average log file parallel write wait over the same hours is not good, but far from the by the 101ms log file sync:

Event                          Avg wait Time (ms)
------------------------------------------
log file parallel write              37

This fits with the array I/O times.

A quick recap on the ‘log file sync’ wait event may be necessary here:

The log file sync wait event is the time the foreground processes spends waiting for the redo to be flushed. It can be assumed here that the LGWR is always active, it does not have to be awaken by the system dispatcher.The log file sync wait can be broken into:

  1. The ‘redo write time’: this is the total elapsed time of the write from the redo log buffer to the current redo log file (in centiseconds).
  2. The “log file parallel write” is basically the time for the log write I/O to complete
  3. The LGWR may have some post processing to do, then signals the waiting foreground process that the write has completed. The foreground process is finally waken up by the system dispatcher. This completes the ‘log file sync’ wait.

I also extracted two other log write statistics from the awr report:

Statistic          Total     per Second     per Trans
-----------------------------------------------------------------------------
redo write time  275,944           77.6           0.2
user commits   1,163,048          327.1           1.0

776/327 = 2.3ms per user commits of “redo write time”

Step 3 therefore accounts for 101-2.3-37=61,7ms. The most likely cause of the main contributor to the log file sync can be found here:

sar -q -f /var/adm/sa/sa07
         runq-sz %runocc swpq-sz %swpocc
05:00:01   15.9      48     0.0      0
05:05:01   24.1      42     0.0      0
05:10:01   25.2      46     0.0      0
05:15:00   15.5      38     0.0      0
05:20:01   15.6      43     0.0      0
05:25:00   14.4      38     0.0      0
05:30:00   17.2      36     0.0      0
05:35:00   18.0      47     0.0      0
05:40:00   16.9      38     0.0      0
05:45:01   12.3      33     0.0      0
05:50:00   10.1      41     0.0      0
05:55:00   11.4      39     0.0      0
06:00:00   11.1      47     0.0      0

Look at the first column, which shows the CPU run queue size. Many processes are competing for the CPUs. Although we could carry on the investigation by measuring how much time foreground processes are spending in the run queue, we can at this point conclude that the processes are spending about two thirds of the log file sync wait times waiting for the CPU, not for the I/O subsystem.

It would also be interesting for the sake of completeness to understand why the %runocc is relatively small compared to the average run queue size.

Blog at WordPress.com.