Christian Bilien’s Oracle performance and tuning blog

August 31, 2007

Migrating large, redo intensive databases to RAC using Data Guard

Filed under: Oracle,RAC — christianbilien @ 7:21 am

Here are some thoughts and a couple of issues raised by the handling of a 10TB data base migration. This 10gR2 data warehouse had to be migrated from a non RAC, Solaris UFS file system environment towards a 3 nodes ASM RAC. What makes this move a not-so-easy endeavor is not primarily its size, but its redo log generation rate of 1GB every minute 24 hrs/ 7 days a week, though some loading is done in NO LOGGING mode. Storage replication is not an option here because of distance and budget. So I went to Data Guard, and found out a couple of things worth sharing:

Data base duplication

The key issue I tried to workaround was the estimated 30 hours of full backup/restore coupled to the 1.8TB of redo log to transfer generated in this time frame I would have to recover on the standby data base. The archived logs were furthermore backed up and purged every 3hrs to make room.

I found this little trick to trigger the Remote File Server (RFS) to start writing on the target standby platform even BEFORE the backup process starts. Here are the steps:

1. After generating a standby control file, I just started mount a dummy standby database on the target server. That’s enough for the RFS to start the log file copy.
2. I ran the rman backup on the source
3. When the backup was ready, I had to bounce in nomount mode the target DB to allow an rman “duplicate for standby”.

The alert file shows the RFS every 5’ the RFS restart attempts :

PING[ARC0]: Heartbeat failed to connect to standby ‘RACSTANDBY’. Error is 12528.

4. This is where it gets interesting, because the “duplicate for standby” mounts the standby data base, allowing the RFS to restart. The rman memory script shows :

{
set until scn 19415505108;
restore clone standby controlfile;

sql clone ‘alter database mount standby database’;
}

and into the alert file on the target system:

Redo Shipping Client Connected as PUBLIC
— Connected User is Valid
RFS[1]: Assigned to RFS process 27224
RFS[1]: Identified database type as ‘physical standby’

Caveat:

1. Allow enough space for the copied log file on the target to allow the restoration and the recovery to complete.
2. A Log archive gap was created in step 3. rman mounted the standby and started copying but some archive log files had to be manually copied. (Although the unchanged FAL configuration worked fine after the data guard setup).

ASM restoration bug

I used Metalink note 273015.1 as a starting point. You have to add to the primary (non RAC data base) 2 new logfile threads, 2 additional undo tablespaces for the second and third RAC instances and run catclust.sql. What this note does not say is the

ORA-00600: internal error code, arguments: [kjrfr7], [], [], [], [], [], [], []

you get in the ASM instance when you attempt to restore the data base (usually after several hours – grrr -). This came down as the unpublished bug 4750469, fixed in 10.2.0.3 which only went out on August 13th on Solaris x86-64, the platform I was working on. The bug can occur when oracle does a sanity checking of a resource whose memory has been released to the shared pool. I worked around the bug by “unclustering” the DB (cluster_database =false) both at the ASM and at the DB level during the restoration phase, and then put back the cluster_database to true.

August 16, 2007

Workload characterization for the uninformed capacity planner

Filed under: HP-UX,Models and Methods,Oracle,Solaris,Storage — christianbilien @ 7:32 pm

Doug Burns initiated an interesting thread a while ago about user or application workloads, their meanings and the difficulties associated with their determination. But workload characterization is both essential and probably the hardest and most prone to error bit off the whole forecasting process. Models that fail to validate (i.e. are not usable) most of the time fall in one of these categories:

• The choice of characteristics and parameters is not relevant enough to describe the workloads and their variations
• The analysis and reduction of performance data was incorrect
• Data collection errors, misinterpretations, etc.

Unless you already know the business environment and the applications, or some previous workload characterization is already in place, you are facing a blank page. You can always try to do the smart workload partition along functional lines, but this effort is unfortunately often preposterous and doomed to failure because of time constraints. So what can be done?

I find the clustering analysis a good compromise between time to deliver and business transactions. Caveat: this method ignores any data cache (storage array, Oracle and File System cache, etc.) and locks/latches or any other waits unrelated to resource waits.

A simple example will explain how it works:

Let’s assume that we have a server with a single CPU and a single I/O path to a disk array. We’ll represent each transaction running on our server by a couple of attributes: the service time each of these transactions requires from the two physical resources.In other words, each transaction will require in absolute terms a given number of seconds of presence on the disk array and another number of seconds on the CPU. We’ll call a required serviced time a “demand on a service center” to avoid confusion. The sum of those two values would represent the response time on an otherwise empty system assuming no interaction occurs with any other external factor. As soon as you start running concurrent transactions, you introduce on one hand waits on locks, latches, etc. and on the other hand queues on the resources: the sum of the demands is no longer the response time. Any transaction may of course visit each resource several times: the sum of the times spent using each service center will simply equal the demand.

Let us consider that we are able to collect the demands each single transaction j requires from our two resource centers. We’ll name
${D}_{j1}$ the CPU demand and ${D}_{j2}$ the disk demand of transaction j. Transaction j can now be represented by a two components workload: ${w}_{j}=({D}_{j1},{D}_{j2})$. Let’s now start the collection. We’ll collect overtime every ${w}_{j}$ that goes on the system. Below is a real 300 points collection on a Windows server. I cheated a little bit because there are four CPUs on this machine but we’ll just say a single queue represents the four CPUs.

The problem is now obvious: there is no natural grouping of transactions with similar requirements. Another attempt can be made using Neperian logs to distort the scales:

This is not good enough either to identify meaningful workloads.

The Minimum Spanning Tree (MST) method can be used to perform successive fusions of data until the wanted number of representative workloads is obtained. It begins by considering each component of a workload to be a cluster of points. Next, the two clusters with the minimum distance are fused to form a cluster. The process iterates until the final number of desired clusters is reached.

• Distance: let’s assume two workloads represented by ${w}_{i}=({D}_{i1},{D}_{i2},...,{D}_{iK})$ and ${w}_{j}=({D}_{j1},{D}_{j2},...,{D}_{jK})$. I moved from just two attributes per workload to K attributes, which will correspond to service times at K service centers. The Euclidian distance between the two workloads will be $d=\sqrt[]{\sum_{n=1}^{K}({D}_{in}-{D}_{jK})}$.
• Each cluster is represented at each iteration by its centroid whose parameter values are the means of the parameter values of all points in the cluster.

Below is a 20 points reduction of the 300 initial points. In real life, thousands of points are used to avoid outliers and average the transactions

August 13, 2007

Summer time statistics

Filed under: Non-technical,Off topic — christianbilien @ 9:47 am

Just back from some long, laptop-free vacations. I’ll slowly resume blogging as soon as I have cleared up the pile of unread mails. A couple of casual news before getting back to more serious matters:

My blog just reached 10000 hits this morning, a modest but still rewarding accomplishment of a 7 months existence. I had an audience bump a few weeks ago when some excellent well-known bloggers and book writer (Jeff Moss, Doug Burns, Jonathan Lewis, Jeremy Schneider – I hope those I did not name won’t be offended) were kind enough to reference my blog. The RAC entries get by far the most hits, followed by the PGA and storage array related posts. Solaris DISM/ISM posts are also in the top league. Most comments are targeted at storage array cache and raid stuff.

I found in my post office mailbox the awaited “Forecasting Oracle Performance” Craig Shallahamer published at Apress. I indulge a craving for mathematical forecasting so I hope this book bridges the gap between Oracle’s own data (AWR and ASH) and the queuing theory.

Blog at WordPress.com.