Christian Bilien’s Oracle performance and tuning blog

July 2, 2007

Asynchronous checkpoints (db file parallel write waits) and the physics of distance

Filed under: HP-UX,Oracle,Solaris,Storage — christianbilien @ 5:15 pm

The first post ( “Log file write time and the physics of distance” ) devoted to the physic of distance was targeting log file writes and “log file sync” waits. It assumed that :

  • The percentage of occupied bandwidth by all the applications which share the pipe was negligible
  • No other I/O subsystem waits were occurring.
  • The application streams writes, i.e. it is able to issue an I/O as soon as the channel is open.

This set of assumptions is legitimate if indeed an application is “waiting” (i.e. not consuming cpu) on log file writes but not on any other I/O related events and the fraction of available bandwidth is large enough for a frame not to be delayed by another applications which share the same pipe, such as an array replication.

Another common Oracle event is the checkpoint completion wait (db file parallel write). I’ll try to explore in this post how the replication distance factor influences the checkpoint durations. Streams of small transactions make the calling program synchronous from the write in the logfile, but checkpoints writes are much less critical by nature because they are asynchronous from the user program perspective. They only influence negatively the response time when “db file parallel write” waits start to appear. The word “asynchronous” could be a source of confusion, but it is not here. The checkpoints I/Os are doubly asynchronous, because the I/Os are also asynchronous at the DBWR level.

1. Synchronous writes: relationship of I/O/s to throughput and percent bandwidth

We did some maths in figure 3 in “Log file write time and the physics of distance” aimed at calculating the time to complete a log write. Let’s do the same with larger writes over a 50km distance on a 2Gb/s FC link. We’ll also add a couple of columns: the number of I/O/s and the fraction of used bandwidth. 2Gb/s = 200MB/s because the FC frame is 10 bytes long.

 

Figure 1: throughput and percent bandwidth as a function of the I/O size (synchronous writes)

I/O size

Time to

load (ms)

Round trip

latency (ms)

Overhead(ms)

Time to complete

an I/O (ms)

IO/s

Throughput

(MB/s)

Percent

bandwidth

2

0,054

0,5

0,6

1,154

867

1,7

0,8%

16

0,432

0,5

0,6

1,532

653

10,2

5,1%

32

0,864

0,5

0,6

1,964

509

15,9

8,0%

64

1,728

0,5

0,6

2,828

354

22,1

11,1%

128

3,456

0,5

0,6

4,556

219

27,4

13,7%

256

6,912

0,5

0,6

8,012

125

31,2

15,6%

512

13,824

0,5

0,6

14,924

67

33,5

16,8%

So what change should we expect to the above results if we change from synchronous writes to asynchronous writes?

2. Asynchronous writes

Instead of firing one write at a time and waiting for completion before issuing the next one, we’ll stream writes one after the other, leaving no “gap” between consecutive writes.

Three new elements will influence the expected maximum number of I/O streams in the pipe:

  • Channel buffer-to-buffer credits
  • Number of outstanding I/O (if any) the controller can support. This is 32 for example for an HP EVA
  • Number of outstanding I/O (if any) the system, or an scsi target can support. On HP-UX, the default number of I/Os that a single SCSI target will queue up for execution is for example 8, the maximum is 255.

Over 50kms, and knowing that the speed of light in fiber is about 5 microseconds per kilometer, the relationship between the I/O size and the packet size in the pipe is shown in figure 2:

Figure 2: between the I/O size and the packet size in the fiber channel pipe

I/O size

(kB)

Time to load

(µs)

Packet length

(km)

2

10,24

2

32

163,84

33

64

327,68

66

128

655,36

131

256

1310,72

262

512

2621,44

524

The packet length for 2KB writes requires a capacity of 25 outstanding I/Os to fill the 50km pipe, but only one I/O can be active for 128KB packets streams. Again, this statement only holds true if the “space” between frames is negligible.

Assuming a zero-gap between 2KB frames, an observation post would see an I/O pass through every 10µs, which corresponds to 100 000 I/O/s. We are here leaving the replication bottleneck as other limiting factors such as at the storage array and computers at both end will now take precedence. However, a single 128KB packet will be in the pipe at a given time: the next has to wait for the previous to complete. Sounds familiar, doesn’t it ? When the packet size exceeds the window size, replication won’t give any benefit to asynchronous I/O writes, because asynchronous writes behave synchronously.

 

8 Comments »

  1. Christian,

    I found your blog via Jonathan Lewis’ and it looks very interesting indeed. However, when I go to https://christianbilien.wordpress.com it takes me to your posting on Feb 7th, rather than you latest posting, which seems a little strange.

    Regards,

    Doug

    Comment by Doug Burns — July 16, 2007 @ 8:39 pm

  2. […] blog via Jonathan Lewis’ blog-roll. Unfortunately, that link takes you to an old post, rather than the current one, but I’ve left a comment and hopefully Christian can sort that out.Looks really interesting, though […]

    Pingback by Doug's Oracle Blog — July 16, 2007 @ 8:44 pm

  3. Doug,

    Thank you for your kind comments. I also like your blog.

    As for the opening page, I cannot find a way to remove the post date. I checked other wordpress blogs, including Jonathan’s, they all show the creation date on the first page. I’ll have to live with it I guess.

    Christian

    Comment by christianbilien — July 16, 2007 @ 9:22 pm

  4. Mmm, maybe it’s something about the specific template that you use, which is the same as Jonathan’s I think?

    Yes, I see what you mean, now. If I go to https://christianbilien.wordpress.com/all-postings then that works.

    However, there are other WordPress blogs that don’t require this, in fact most that I’m aware of don’t.

    Regardless, I’ll just update my links to point to the all-postings URL.

    Cheers

    Comment by Doug Burns — July 16, 2007 @ 9:32 pm

  5. You said:

    “The checkpoints I/Os are doubly asynchronous, because the I/Os are also asynchronous at the DBWR level.”

    Are you saying that DBWR performs delayed(cached) write operations or I misunderstood your “doubly asynchronous” ? If you are, then it is incorrect: DBWR performs synchronous writes that are not complete until the data are on the disk(or on a battery backed up RAID). Of course, those synchronous writes can be initiated asynchronously !

    Comment by Val Carey — August 25, 2007 @ 11:46 pm

  6. This why I said “The word asynchronous could be a source of confusion”. Let’s put it this way: asynchronous from the programs (it should be at least), and if used async I/O (use libaio calls = kernelized aio calls on most platforms). Each I/O is synchronous (direct or not).

    Comment by christianbilien — August 26, 2007 @ 7:15 pm

  7. You said:
    “(use libaio calls = kernelized aio calls on most platforms). Each I/O is synchronous (direct or not).”

    Right, but I still do not understand what you meant by “doubly asynchronous”. It would perhaps be useful if you gave an example of just asynchronous vs. doubly asynchronous.

    Thanks.

    Comment by Val Carey — September 7, 2007 @ 2:05 am

  8. Enabling asynchronous I/O is good or not? I don’t think async I/O is available in Windows platforms. Nice maths calculation between I/O size and packet length.

    Comment by M.Venkatesh — December 30, 2013 @ 4:33 pm


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: