Christian Bilien’s Oracle performance and tuning blog

Asynchronous checkpoints (db file parallel write waits) and the physics of distance

Advertisements

The first post ( “Log file write time and the physics of distance” ) devoted to the physic of distance was targeting log file writes and “log file sync” waits. It assumed that :

This set of assumptions is legitimate if indeed an application is “waiting” (i.e. not consuming cpu) on log file writes but not on any other I/O related events and the fraction of available bandwidth is large enough for a frame not to be delayed by another applications which share the same pipe, such as an array replication.

Another common Oracle event is the checkpoint completion wait (db file parallel write). I’ll try to explore in this post how the replication distance factor influences the checkpoint durations. Streams of small transactions make the calling program synchronous from the write in the logfile, but checkpoints writes are much less critical by nature because they are asynchronous from the user program perspective. They only influence negatively the response time when “db file parallel write” waits start to appear. The word “asynchronous” could be a source of confusion, but it is not here. The checkpoints I/Os are doubly asynchronous, because the I/Os are also asynchronous at the DBWR level.

1. Synchronous writes: relationship of I/O/s to throughput and percent bandwidth

We did some maths in figure 3 in “Log file write time and the physics of distance” aimed at calculating the time to complete a log write. Let’s do the same with larger writes over a 50km distance on a 2Gb/s FC link. We’ll also add a couple of columns: the number of I/O/s and the fraction of used bandwidth. 2Gb/s = 200MB/s because the FC frame is 10 bytes long.

 

Figure 1: throughput and percent bandwidth as a function of the I/O size (synchronous writes)

I/O size

Time to

load (ms)

Round trip

latency (ms)

Overhead(ms)

Time to complete

an I/O (ms)

IO/s

Throughput

(MB/s)

Percent

bandwidth

2

0,054

0,5

0,6

1,154

867

1,7

0,8%

16

0,432

0,5

0,6

1,532

653

10,2

5,1%

32

0,864

0,5

0,6

1,964

509

15,9

8,0%

64

1,728

0,5

0,6

2,828

354

22,1

11,1%

128

3,456

0,5

0,6

4,556

219

27,4

13,7%

256

6,912

0,5

0,6

8,012

125

31,2

15,6%

512

13,824

0,5

0,6

14,924

67

33,5

16,8%

So what change should we expect to the above results if we change from synchronous writes to asynchronous writes?

2. Asynchronous writes

Instead of firing one write at a time and waiting for completion before issuing the next one, we’ll stream writes one after the other, leaving no “gap” between consecutive writes.

Three new elements will influence the expected maximum number of I/O streams in the pipe:

Over 50kms, and knowing that the speed of light in fiber is about 5 microseconds per kilometer, the relationship between the I/O size and the packet size in the pipe is shown in figure 2:

Figure 2: between the I/O size and the packet size in the fiber channel pipe

I/O size

(kB)

Time to load

(µs)

Packet length

(km)

2

10,24

2

32

163,84

33

64

327,68

66

128

655,36

131

256

1310,72

262

512

2621,44

524

The packet length for 2KB writes requires a capacity of 25 outstanding I/Os to fill the 50km pipe, but only one I/O can be active for 128KB packets streams. Again, this statement only holds true if the “space” between frames is negligible.

Assuming a zero-gap between 2KB frames, an observation post would see an I/O pass through every 10µs, which corresponds to 100 000 I/O/s. We are here leaving the replication bottleneck as other limiting factors such as at the storage array and computers at both end will now take precedence. However, a single 128KB packet will be in the pipe at a given time: the next has to wait for the previous to complete. Sounds familiar, doesn’t it ? When the packet size exceeds the window size, replication won’t give any benefit to asynchronous I/O writes, because asynchronous writes behave synchronously.

 

Advertisements

Advertisements