Christian Bilien’s Oracle performance and tuning blog

August 16, 2007

Workload characterization for the uninformed capacity planner

Filed under: HP-UX,Models and Methods,Oracle,Solaris,Storage — christianbilien @ 7:32 pm

Doug Burns initiated an interesting thread a while ago about user or application workloads, their meanings and the difficulties associated with their determination. But workload characterization is both essential and probably the hardest and most prone to error bit off the whole forecasting process. Models that fail to validate (i.e. are not usable) most of the time fall in one of these categories:

  • The choice of characteristics and parameters is not relevant enough to describe the workloads and their variations
  • The analysis and reduction of performance data was incorrect
  • Data collection errors, misinterpretations, etc.

Unless you already know the business environment and the applications, or some previous workload characterization is already in place, you are facing a blank page. You can always try to do the smart workload partition along functional lines, but this effort is unfortunately often preposterous and doomed to failure because of time constraints. So what can be done?

I find the clustering analysis a good compromise between time to deliver and business transactions. Caveat: this method ignores any data cache (storage array, Oracle and File System cache, etc.) and locks/latches or any other waits unrelated to resource waits.

A simple example will explain how it works:

Let’s assume that we have a server with a single CPU and a single I/O path to a disk array. We’ll represent each transaction running on our server by a couple of attributes: the service time each of these transactions requires from the two physical resources.In other words, each transaction will require in absolute terms a given number of seconds of presence on the disk array and another number of seconds on the CPU. We’ll call a required serviced time a “demand on a service center” to avoid confusion. The sum of those two values would represent the response time on an otherwise empty system assuming no interaction occurs with any other external factor. As soon as you start running concurrent transactions, you introduce on one hand waits on locks, latches, etc. and on the other hand queues on the resources: the sum of the demands is no longer the response time. Any transaction may of course visit each resource several times: the sum of the times spent using each service center will simply equal the demand.

Let us consider that we are able to collect the demands each single transaction j requires from our two resource centers. We’ll name
{D}_{j1} the CPU demand and {D}_{j2} the disk demand of transaction j. Transaction j can now be represented by a two components workload: {w}_{j}=({D}_{j1},{D}_{j2}). Let’s now start the collection. We’ll collect overtime every {w}_{j} that goes on the system. Below is a real 300 points collection on a Windows server. I cheated a little bit because there are four CPUs on this machine but we’ll just say a single queue represents the four CPUs.

T1

The problem is now obvious: there is no natural grouping of transactions with similar requirements. Another attempt can be made using Neperian logs to distort the scales:

t2.gif

This is not good enough either to identify meaningful workloads.

The Minimum Spanning Tree (MST) method can be used to perform successive fusions of data until the wanted number of representative workloads is obtained. It begins by considering each component of a workload to be a cluster of points. Next, the two clusters with the minimum distance are fused to form a cluster. The process iterates until the final number of desired clusters is reached.

  • Distance: let’s assume two workloads represented by {w}_{i}=({D}_{i1},{D}_{i2},...,{D}_{iK}) and {w}_{j}=({D}_{j1},{D}_{j2},...,{D}_{jK}). I moved from just two attributes per workload to K attributes, which will correspond to service times at K service centers. The Euclidian distance between the two workloads will be d=\sqrt[]{\sum_{n=1}^{K}({D}_{in}-{D}_{jK})}.
  • Each cluster is represented at each iteration by its centroid whose parameter values are the means of the parameter values of all points in the cluster.

    Below is a 20 points reduction of the 300 initial points. In real life, thousands of points are used to avoid outliers and average the transactions

    t3.gif

5 Comments »

  1. Seems to be a problem with your formulas and images.

    Comment by Anonymous — August 17, 2007 @ 1:51 pm

  2. Thanks for the tip.

    Looks like the WordPress latex rendering does not work this afternoon. http://faq.wordpress.com/2007/02/18/can-i-put-math-or-equations-in-my-posts/ or http://wordpress.com/blog/2007/02/17/math-for-the-masses/ do not render the formulas. Some but not all seem to work in Firefox, but IE won’t show any of them

    Comment by christianbilien — August 17, 2007 @ 2:33 pm

  3. Hello,
    thanks for the nice article!

    How do you get the demands D_cpu and D_io for every transaction running in a Oracle DB? Statspack is the source?

    Thanks

    Comment by Anonymous — November 12, 2007 @ 9:41 am

  4. Hi,

    The easiest is to use v$sesstat/v$sysstat to get user commits, CPU used when call started and CPU used by this session, physical reads and physical write. physical write is the trickiest because of the delayed writes that also need to be accounted of. Very often this is not a problem as the database has a large read to write ratio.

    Cheers

    Christian

    Comment by christianbilien — November 12, 2007 @ 7:25 pm

  5. Hi Christian,

    Thanks for the interesting thread. I want to have further reading on this subject area, as being a capacity planner, we need to characterize the workload. Could you please give me references for the same.
    I am looking to characterize the Oracle database workload data (some metrics are logical reads/sec, user calls/sec, executions/sec) and then correlate the workload classes with utilization (CPU utlz, I/O utilization), but things are not clear how to proceed.

    Thanks,

    Comment by Neeraj Bhatia — October 23, 2008 @ 2:58 pm


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: