Christian Bilien’s Oracle performance and tuning blog

January 28, 2008

An upper bound of the transactions throughputs

Filed under: Models and Methods — christianbilien @ 9:46 pm

Capacity planning fundamental laws are seldom used to identify benchmark flaws, although some of these laws are almost trivial. Worse, some performance assessments provide performance outputs which are individually commented without even realizing that physical laws bind them together.

Perhaps the simplest of them all is the Utilization law, which states that the utilization of a resource is equal to the product of its throughput and its average service time. The utilization is the portion of time the resource is busy serving requests. The cpu(s) utilizations are given by sar –u. The individual disk utilizations in a storage array by the storage vendors proprietary tools. sar -d or iostat can be used to collect data for internal disks.

Take a disk serving a fairly steady load I picked up on an HP-UX test system with no storage array attached . sar –d gave the following data:

Time Utilization(%) Service time(ms) Reads/s Writes/s
15:07 70.8 10 4 67
15:12 67.3 12.3 30.6 24.1
15:17 67.8 12 33.7 22.7

This formula can be verified for the 3 points I picked up:

Utilization = (Read/s+writes/s) x service time

This law can be used to define an asymptotic bound, which is an optimistic bound since it indicates the best possible performance. If each application transaction spends {D}_{k} seconds on disk k, and {X} is the application throughput, the utilization law can be rewritten for disk k as {U}_{k}=X*{D}_{k}. An increase in the arrival rate can be accommodated as long as none of the disks are saturated (i.e. has a utilization of 100%). The throughput bound {X}_{max} is therefore the arrival rate at which any of the disk centers saturates. If {D}_{max} is the maximum disk service time, the upper bound to the transaction throughput can be found when one of the disks has a utilization of 1 (100%):


{D}_{max}*{X}_{max}=1

therefore

{X}_{max}=\frac{1}{{D}_{max}}

Let’s replace our disk by a single volume which encompasses a whole raid group inside an array, and consider that this raid group is dedicated to a single batch. Other raid groups participate to the transactions but we’ll focus on the most accessed one. If our transaction needs to make 10 synchronous visits (meaning each of them has to wait for the previous one to complete) to the most accessed volume in the storage array, and each of the visits “costs” 10ms, we’ll have {D}_{max}=100ms=0.1s. The best possible throughput we can get is 10 transactions per seconds.

 

 

 

January 19, 2008

Oracle’s clusterware real time priority oddity

Filed under: Oracle,RAC — christianbilien @ 6:16 pm


The CSS processes use both the interconnect and the voting disks to monitor remote node. A node must be able to access strictly more than half of the voting disks at any time (this is the reason for the odd number of voting disks), which prevents split brain. Let’s just recall that split brains are encountered when several cluster “islands” are created without being aware of each other. Oracle uses a modified STONITH (Shoot The Other Node In The Head) algorithm, although instead of being able to fail other nodes, one node can merely instruct the other nodes to commit suicide.

This subtlety has far reaching consequences: the clusterware software on each node MUST be able to coordinate its own actions in any case without relying upon the other nodes. There is an obvious potential problem when the clusterware processes cannot get the CPU in a timely manner, especially as a lot of the cssd code is running in user mode. This can be overcome by raising the cssd priority, something which was addressed by the 10.2.0.2 release by setting the css priority in the ocr registry:

crsctl set css priority 4

Meaning of priority 4
You’ll see on Solaris in /etc/init.d/init.cssd the priority boost mechanism which corresponds to the values you can pass to crsctl set css priority:

PRIORITY_BOOST_DISABLED=0
PRIORITY_BOOST_LOW=1
PRIORITY_BOOST_MID=2
PRIORITY_BOOST_HIGH=3
PRIORITY_BOOST_REALTIME=4
PRIORITY_BOOST_RENICE_LOW=-5
PRIORITY_BOOST_RENICE_MID=-13
PRIORITY_BOOST_RENICE_HIGH=-20
PRIORITY_BOOST_RENICE_REALTIME=0
PRIORITY_BOOST_ENABLED=1
PRIORITY_BOOST_DEFAULT=$PRIORITY_BOOST_HIGH

A bit further down the file:

RTGPID=’/bin/priocntl -s -c RT -i pgid’

Further down:

  if [ $PRIORITY_BOOST_ENABLED = '1' ]; then
    NODENAME=`$CRSCTL get nodename`     # check to see the error codes
    case $? in
    0)
      # since we got the node name, now try toget the actual
      # boost value for the node
     PRIORITY_BOOST_VALUE=`$CRSCTL get css priority node $NODENAME` => retrieves the PRIORITY_BOOST_VALUE

Still further down:

    case $PRIORITY_BOOST_VALUE in
      $PRIORITY_BOOST_LOW)
        # low priority boost
        $RENICE $PRIORITY_BOOST_RENICE_LOW -p $$
        ;;
      $PRIORITY_BOOST_MID)
        # medium level boost
        $RENICE $PRIORITY_BOOST_RENICE_MID -p $$
        ;;
      $PRIORITY_BOOST_HIGH)
        # highest level normal boost
        $RENICE $PRIORITY_BOOST_RENICE_HIGH -p $$
        ;;
      $PRIORITY_BOOST_REALTIME)
        # realtime boost only should be used on platforms that support this
        $RTGPID $$ => realtime 
;;

So setting a priority of 4 should set the current shell and its children to RT using priocntl.

… but the cssd daemons do not run under RT

I noticed this oddity in the 10.2.0.2 release on Solaris 8 following a RAC node reboot under a (very) heavy CPU and memory load. There is in the trace file:

[ CSSD]2008-01-03 18:49:04.418 [11] >WARNING: clssnmDiskPMT: sltscvtimewait timeout (282535)
[ CSSD]2008-01-03 18:49:04.428 [11] >TRACE: clssnmDiskPMT: stale disk (282815 ms) (0//dev/vx/rdsk/racdg/vote_vol)
[ CSSD]2008-01-03 18:49:04.428 [11] >ERROR: clssnmDiskPMT: 1 of 1 voting disk unavailable (0/0/1)

This is a timeout after a 282,535s polling on the voting disk. IO errors were neither reported in /var/adm/messages nor by the storage array.

The priority is unset by default:

$crsctl get css priority
Configuration parameter priority is not defined.
 
$crsctl set css priority 4
Configuration parameter priority is now set to 4.

This is written into the ocr registry (from ocrdump):

[…]
[SYSTEM.css.priority]
UB4 (10) : 4
SECURITY : {USER_PERMISSION : PROCR_ALL_ACCESS, GROUP_PERMISSION : PROCR_READ, OTHER_PERMISSION : PROCR_READ, USER_NAME : root, GROUP_NAME : root}
[…]

Look at the priorities:

$/crs/10.2.0/bin # ps -efl | grep cssd | grep –v grep| grep –v fatal
 0 S     root  1505  1257   0  40 20        ?    297        ?   Dec 27 ?           0:00 /bin/sh /etc/init.d/init.cssd oproc
 0 S     root  1509  1257   0  40 20        ?    298        ?   Dec 27 ?           0:00 /bin/sh /etc/init.d/init.cssd oclso
 0 S     root  1510  1257   0  40 15        ?    298        ?   Dec 27 ?           0:00 /bin/sh /etc/init.d/init.cssd daemo
 0 S   oracle  1712  1711   0  40 15        ?   9972        ?   Dec 27 ?          71:52 /crs/10.2.0/bin/ocssd.bin

the second column in red shows the nice value, which has been amended from 20 to 15, but 40 is still a TimeShare priority, making init.cssd and ocssd.bin less prone to preemption but still exposed to high loads.

oprocd is the only css process to run in Real Time, whatever the priority setting. Its purpose is to detect system hangs

ps -efl| grep oprocd
0 S     root 27566 27435   0   0 RT        ?    304       ?   Nov 17 console     9:49 /crs/10.2.0/bin/oprocd.bin run -t 1

The css priorities remained unchanged after stopping and restarting the crs. Part of the above behavior is described in Metalink note:4127644.8 where it is filed as a bug. I did not test crsctl set css priority on a 10.2.0.3. The priorities are identical to what they are on 10.2.0.2 when unset.

January 11, 2008

Tag flood

Filed under: Off topic — christianbilien @ 4:08 pm

 

Amid the tag furry, I was tagged some days ago by Jeff Moss so I’ll have to give some pieces of information about myself presumably of low interest to most. I had not put any personal information on my blog, so here are 8 of them which I’ll keep short anyway:

  • I come from a small town in Brittany, France, located almost as far west as conceivable before falling off a cliff. My parents spoke Breton before they learned French at school.
  • I was an undecided student, who spent 6 years studying Mathematics and Physics followed by a year in management studies before realizing that what he was interested in was History.
  • So I read a lot, predominantly History and current affairs books.The last book I read: The Great War for Civilization by Robert Fisk (highly controversial in the USA, less in the UK).
  • I volunteered to join the army paratroopers (national service was mandatory for boys only at the time –how unjust-), a formative experience to say the least.
  • I love performance tuning and modeling but I am not a geek as such. I have no interest whatsoever in “intelligent” phones, PDAs, games, and so on.
  • I feel depressed unless I run at least 40km a week, and I also love diving, sailing and swimming.
  • I love traveling, especially by foot in the deserts. The landscapes that most impressed me were in the Namib Desert. Most unsettling experience abroad: living in the UK for two years (I’m joking here – I had great time and was feeling half English when I left).
  • Last bullet, well what else could I say? How about “I hate DIY, much to my wife’s dismay”?

Well, thanks for reaching this point.

 

I am quite a late comer to the tag thing: I have a feeling that all the oracle blogs I read have already been tagged, so I’ll stop the chain here.

 

 

January 6, 2008

Where has all my memory gone ?

Filed under: Oracle — christianbilien @ 8:09 pm

A while ago, I came across an interesting case of memory starvation on a Oracle DB server running Solaris 8 that was for once not directly related to the SGA or the PGA. The problem showed up from a user perspective as temporary “hangs” that only seemed to happen at a specific time of the day. This server is dedicated to a single 10gR2 Oracle instance. Looking at the OS figures, the first things that I saw were some vmstat signs of memory pressure:

A high number of page reclaims, 200 to 500 Mb of free memory left out of 16GB and 2000 to 3000 pages/s scanned by the page scanner. Look at how memory is allocated using prtmem:

Total memory:           15614 Megabytes
Kernel Memory:           1534 Megabytes
Application:            12888 Megabytes
Executable & libs:        110 Megabytes
File Cache:               410 Megabytes
Free, file cache:         250 Megabytes
Free, free:               430 Megabytes

But look at the Oracle SGA and PGA:

SGA:

Total System Global Area 6442450944 bytes
Fixed Size                  2038520 bytes
Variable Size            3489662216 bytes
Database Buffers         2936012800 bytes
Redo Buffers               14737408 bytes

PGA:

  select * from v$pgastat
 NAME                                                                  VALUE UNIT
---------------------------------------------------------------- ---------- ------------
aggregate PGA target parameter                                   1048576000 bytes
aggregate PGA auto target                                          65536000 bytes
global memory bound                                                 2258944 bytes
total PGA inuse                                                  1181100032 bytes
total PGA allocated                                              2555433984 bytes
maximum PGA allocated                                            2838683648 bytes
total freeable PGA memory                                         755367936 bytes
process count                                                          1943
max processes count                                                    2273
PGA memory freed back to OS                                      2.5918E+11 bytes
total PGA used for auto workareas                                   9071616 bytes

Well, that’s at this point 7.5GB (again out of 16GB) for the PGA currently in use + the SGA allocation. I assumed here that because of the memory pressure, the unused part of the PGA was already paged out. Prtmem showed an “application” memory size of 12.9GB.

Where are the 5.4 GB gone ?

I looked at the structure of one of the processes using pmap –x:

pmap -x 14816
         Address   Kbytes Resident Shared Private Permissions       Mapped File
0000000100000000  100536   44600   44600       - read/exec         oracle
000000010632C000     816     560     368     192 read/write/exec   oracle
00000001063F8000     912     904       -     904 read/write/exec     [ heap ]
0000000380000000   16384   16384   16384       - read/write/exec/shared  [ ism shmid=0x5004 ]
00000003C0000000 3145728 3145728 3145728       - read/write/exec/shared  [ ism shmid=0x2005 ]
0000000480000000 3129360 3129360 3129360       - read/write/exec/shared  [ ism shmid=0x11007 ]
FFFFFFFF7B270000     128      48       -      48 read/write          [ anon ]
FFFFFFFF7B300000      64      64       -      64 read/write          [ anon ]
FFFFFFFF7B310000     448     328       -     328 read/write          [ anon ]
FFFFFFFF7B400000       8       8       8       - read/write/exec/shared   [ anon ]
FFFFFFFF7B500000       8       8       -       8 read/write/exec     [ anon ]
FFFFFFFF7B600000      16      16      16       - read/exec         libmp.so.2
FFFFFFFF7B704000       8       8       -       8 read/write/exec   libmp.so.2
FFFFFFFF7B800000       8       8       -       8 read/write/exec     [ anon ]
FFFFFFFF7B900000     216     216     216       - read/exec         libm.so.1
FFFFFFFF7BA34000      16      16       -      16 read/write/exec   libm.so.1
FFFFFFFF7BB00000      24      24      24       - read/exec         librt.so.1
FFFFFFFF7BC06000       8       8       -       8 read/write/exec   librt.so.1
FFFFFFFF7BD00000      32      32      32       - read/exec         libaio.so.1
FFFFFFFF7BE08000       8       8       -       8 read/write/exec   libaio.so.1
FFFFFFFF7BF00000     728     728     728       - read/exec         libc.so.1
FFFFFFFF7C0B6000      56      56       -      56 read/write/exec   libc.so.1
FFFFFFFF7C0C4000       8       8       -       8 read/write/exec   libc.so.1
FFFFFFFF7C100000       8       8       -       8 read/write/exec     [ anon ]
FFFFFFFF7C200000       8       8       8       - read/exec         libsched.so.1
FFFFFFFF7C302000       8       8       -       8 read/write/exec   libsched.so.1
FFFFFFFF7C400000       8       8       -       8 read/write/exec   libdl.so.1
FFFFFFFF7C500000      32      24      24       - read/exec         libgen.so.1
FFFFFFFF7C608000       8       8       -       8 read/write/exec   libgen.so.1
FFFFFFFF7C700000      56      56      56       - read/exec         libsocket.so.1
FFFFFFFF7C80E000      16      16       -      16 read/write/exec   libsocket.so.1
FFFFFFFF7C900000     672     672     672       - read/exec         libnsl.so.1
FFFFFFFF7CAA8000      64      64       -      64 read/write/exec   libnsl.so.1
FFFFFFFF7CAB8000      32      32       -      32 read/write/exec   libnsl.so.1
FFFFFFFF7CB00000       8       8       -       8 read/write/exec     [ anon ]
FFFFFFFF7CC00000       8       8       8       - read/exec         libkstat.so.1
FFFFFFFF7CD02000       8       8       -       8 read/write/exec   libkstat.so.1
FFFFFFFF7CE00000    2176     376     376       - read/exec         libnnz10.so
FFFFFFFF7D11E000     240     232      16     216 read/write/exec   libnnz10.so
FFFFFFFF7D15A000       8       -       -       - read/write/exec   libnnz10.so
FFFFFFFF7D200000      72      72      72       - read/exec         libdbcfg10.so
FFFFFFFF7D310000       8       8       -       8 read/write/exec   libdbcfg10.so
FFFFFFFF7D400000    1056     112     112       - read/exec         libclsra10.so
FFFFFFFF7D606000      48      32       -      32 read/write/exec   libclsra10.so
FFFFFFFF7D612000       8       -       -       - read/write/exec   libclsra10.so
FFFFFFFF7D700000       8       8       -       8 read/write/exec     [ anon ]
FFFFFFFF7D800000    9256    3272    3272       - read/exec         libjox10.so
FFFFFFFF7E208000     560     472       8     464 read/write/exec   libjox10.so
FFFFFFFF7E300000    1056     208     208       - read/exec         libocrutl10.so
FFFFFFFF7E506000      56      48      16      32 read/write/exec   libocrutl10.so
FFFFFFFF7E514000       8       -       -       - read/write/exec   libocrutl10.so
FFFFFFFF7E600000    1256     136     136       - read/exec         libocrb10.so
FFFFFFFF7E838000      64      56       -      56 read/write/exec   libocrb10.so
FFFFFFFF7E848000       8       -       -       - read/write/exec   libocrb10.so
FFFFFFFF7E900000    1368     536     536       - read/exec         libocr10.so
FFFFFFFF7EB54000      72      56       -      56 read/write/exec   libocr10.so
FFFFFFFF7EC00000       8       8       -       8 read/write/exec     [ anon ]
FFFFFFFF7ED00000       8       8       8       - read/exec         libskgxn2.so
FFFFFFFF7EE00000       8       8       -       8 read/write/exec   libskgxn2.so
FFFFFFFF7EF00000    1736    1088    1088       - read/exec         libhasgen10.so
FFFFFFFF7F1B0000      72      64       -      64 read/write/exec   libhasgen10.so
FFFFFFFF7F1C2000       8       8       -       8 read/write/exec   libhasgen10.so
FFFFFFFF7F200000     128     128     128       - read/exec         libskgxp10.so
FFFFFFFF7F31E000      16      16       -      16 read/write/exec   libskgxp10.so
FFFFFFFF7F400000       8       8       8       - read/exec         libc_psr.so.1
FFFFFFFF7F500000       8       8       -       8 read/write/exec     [ anon ]
FFFFFFFF7F600000     176     176     176       - read/exec         ld.so.1
FFFFFFFF7F72C000      16      16       -      16 read/write/exec   ld.so.1
FFFFFFFF7FFE0000     128     128       -     128 read/write          [ stack ]
----------------  ------  ------  ------  ------
        total Kb 6416104 6347336 6344392    2944

Look at the resident size of the private section of the segments: the total private size is about 2.9MB. The largest private chunk is the heap, but it is only 1/3rd of the total private space. The remaining part of the private area resident in physical memory is made of anon segments and of private data sections.

_use_real_free_heap=true (the default in Oracle 10), meaning different heaps are used for the process portion of the PGA plus the CGA (call global area) and the UGA and possibly of other components. _use_ism_for_pga is also set to its default value (false), meaning the PGA is indeed part of the heap, not allocated from an ISM segment.

This is where it gets interesting: the number of oracle user processes at a given point at this time of the day is around 2000. I plotted for 100 randomly selected oracle processes the private size occupied by each of the processes: they all had a private memory of 3MB +/- 10%, and it was unlikely that any of the processes would significantly allocate more memory than the others. 3MB of private memory/process x 2000= 6GB: that’s about 4.9GB of “non PGA” private space (PGA in use=1.1GB).This example highlights the fact that UGA, CGAs and other portions of private memory, although seldom accounted for sizing the memory are not always negligible when the data base hosts many connexions.

Blog at WordPress.com.