Christian Bilien’s Oracle performance and tuning blog

May 25, 2007

Oracle ISM and DISM: more than a no paging scheme (2/2)… but be careful with Solaris 8

Filed under: Oracle,Solaris — christianbilien @ 9:39 pm

This post is the DISM follow up to the ISM-only Oracle ISM and DISM: Oracle ISM and DISM: more than a no paging scheme (1/2).

DISM (Dynamic Intimate Shared Memory) is the pageable variant of ISM. DISM was made available on Solaris 8. The DISM segment is attached to a process through the shmat system call. SHM_DYNAMIC is a new flag that tells shmat to create Dynamic ISM rather than the SHM_SHARE_MMU flag used for ISM.

DISM is like ISM except that it isn’t automatically locked. The application, not the kernel does the locking, which is done by using mlock. Kernel virtual-to-physical memory address translation structures are shared among processes that attach to the DISM segment. This is one of the DISM benefits: saving kernel memory and CPU time. As with ISM, shmget creates the segment. The shmget size specified is the maximum size of the segment. The size of the segment can be larger than physical memory. Enough of disk swap should be made available to cover the maximum possible DISM size.

Per the Oracle 10gR2 installation guide on Solaris platforms:

Oracle Database automatically selects ISM or DISM based on the following criteria:

  • Oracle Database uses DISM if it is available on the system, and if the value of the SGA_MAX_SIZE initialization parameter is larger than the size required for all SGA components combined. This enables Oracle Database to lock only the amount of physical memory that is used.
  • Oracle Database uses ISM if the entire shared memory segment is in use at startup or if the value of the SGA_MAX_SIZE parameter is equal to or smaller than the size required for all SGA components combined. 

I ran a few logical I/O intensive tests aimed at highlighting some possible performance loss when moving from ISM to DISM (as pages are not permanently locked in memory, swap management has to be invoked), but I couldn’t find any meaningful difference. Most of the benefits I described in the Oracle ISM and DISM: more than a no paging scheme (1/2) post still applies, except for the non-support of large pages in Solaris 8 (see below).

Since DISM requires the application to lock memory, and since memory locking can only be carried out by applications with superuser privileges, the $ORACLE_HOME/bin/oradism daemon run as root using setuid (early 9i releases had a different mechanism, using RBAC instead of setuid).

Solaris 8 problems:

Dynamic Intimate Shared Memory (DISM) was introduced in the 1/01 release of Solaris 8 (Update 3). DISM was supported by Oracle9i for SGA resizing.

On a 10gR2 database running on Solaris 10, it can be seen than large pages are used by DISM :

pmap -sx 19609| more

19609: oracleSID11 (LOCAL=NO)

Address Kbytes RSS Anon Locked Pgsz Mode Mapped File
0000000380000000 16384 16384 4M rwxs- [ dism shmid=0x70000071 ]

Per the following Sun Solve note http://sunsolve.sun.com/search/document.do?assetkey=1-9-72952-1&searchclause=dism%2420large%2420page

“In this first release, large MMU pages were not supported. For Solaris 8 systems with 8GB of memory or less, it is reasonable to expect a performance degradation of up to 10% compared to ISM, due to the lack of large page support in DISM […] Sun recommends avoiding DISM on Solaris 8 either where SGAs are greater than 8 Gbytes in size, or on systems with a typical CPU utilization of 70% or more. In general, where performance is critical, DISM should be avoided on Solaris 8. As we will see, Solaris 9 Update 2 (the 12/02 release) is the appropriate choice for using DISM with systems of this type.”

http://www.sun.com/blueprints/0104/817-5209.pdf from Sun advocates on Solaris 8 the use of DISM primarily for the machine maintenance, such as removing a memory board, but it fails to mention that large MMU pages are not supported.

May 14, 2007

Oracle ISM and DISM: more than a no paging scheme (1/2)

Filed under: Oracle,Solaris — christianbilien @ 12:54 pm

This post only deals with ISM. I’ll write second one about Dynamic ISM (DISM) .

A long standing problem on any platform has been the probability that part of the Oracle memory segment gets swapped out and that what is a relatively memory fast access turns into a horrid bottleneck. Oracle 9i on Solaris made use of an interesting feature named Intimate Shared Memory (ISM) which in fact makes a lot more than what one may think of initially.

The very first benefit of ISM (not DISM for the time being) is that the shared memory is locked by the kernel when the segment is created: the memory cannot be paged out. A small price to pay to the locking mechanism is that sufficient available unlocked memory must exist for the allocation to succeed.

Because the SHM_SHARE_MMU flag is set in the shmat system call to set up the shared segment as ISM, there are less known benefits, which may be of a higher importance than the no paging scheme on CPU bounds systems.

 

Shared kernel virtual-to-physical translation

The virtual to physical mapping is one of the most consuming tasks any modern operating system has to perform. The hardware Translation Lookaside buffer (TLB) is a physical cache to the slower in-memory tables. The Translation Storage Buffer (TSB) is a further translation in memory cache. As even in Solaris 10 the standard System V algorithm is still to have a private virtual address space for each process, aliasing (several virtual addresses exist that map to the same physical address).

ISM allows the sharing of kernel virtual-to-physical memory between processes that attach to the shared memory, saving considerable translation slots in the hardware TLB. This can be monitored on Solaris 10 by trapstat:

# trapstat -T

cpu m size| itlb-miss %tim itsb-miss %tim | dtlb-miss %tim dtsb-miss %tim |%tim

———-+——————————-+——————————-+—-

512 u 8k| 1761 0.1 2841 0.2 | 2594 0.1 2648 0.2 | 0.5

512 u 64k| 0 0.0 0 0.0 | 8 0.0 0 0.0 | 0.0

512 u 512k| 0 0.0 0 0.0 | 0 0.0 0 0.0 | 0.0

512 u 4m| 20 0.0 1 0.0 | 4 0.0 0 0.0 | 0.0

512 u 32m| 0 0.0 0 0.0 | 11 0.0 0 0.0 | 0.0

512 u 256m| 0 0.0 0 0.0 | 0 0.0 0 0.0 | 0.0

trapstat show both instruction and data misses in both the TLB and the TSB.

Solaris 8 does not have trapstat, so the trick is to use cpustat:

On a non-idle Oracle system using ISM as seen below,

mpstat 5 5

CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl

0 0 0 282 728 547 1842 283 329 62 10 3257 40 8 25 27

1 0 0 122 227 2 1954 284 327 55 9 3639 39 6 29 26

2 0 0 257 1578 1399 1887 288 330 58 9 3287 35 11 27 27

3 1 0 313 1758 1501 1933 285 328 70 12 3437 36 8 29 27

cpustat -c pic0=Cycle_cnt,pic1=DTLB_miss 1

time cpu event pic0 pic1

1.010 3 tick 192523799 29658

1.010 2 tick 270995815 28499

1.010 0 tick 225156772 29621

1.010 1 tick 234603152 29034

psrinfo –v

Status of processor 3 as of: 05/14/07 12:48:53
Processor has been on-line since
03/11/07 10:35:22.
The sparcv9 processor operates at 1062 MHz,
and has a sparcv9 floating point processor.

cpustat shows that on processor 3, we have 29658 dTLB misses on this sample. UltraSparcIII will use somewhere between 50 cycles (most favourable case: no TLB entry miss) and 300 cycles (worst case: a memory load has to be performed to compute the translation) to handle dTLB accesses. It will take in the best scenario 1.5 million cycles per seconds and 8.9 millions in the worst to handle the misses. At 1062Mhz, the time spent handling dTLB misses is only between 0.14% and 0.84% !

Large pages.

From Solaris 2.6 through Solaris 8, large pages are only available through the use of the ISM (using SHM_SHARE_MMU).

Solaris 8

pagesize
8192

Solaris 10 :default pagesize

pagesize
8192

Supported page sizes:

pagesize -a
8192

65536

524288

4194304

ISM page size on Solaris 10 (look at the pgsz column). It looks like Oracle is using the largest page available

 

pmap -sx 25921

25921: oracleSID1 (LOCAL=NO)

Address Kbytes RSS Anon Locked Pgsz Mode Mapped File

00000001064D2000 24 24 24 8K rwx– [ heap ]

00000001064D8000 32 8 – rwx– [ heap ]

0000000380000000 1048576 1048576 1048576 4M rwxsR [ ism shmid=0x6f000078 ]

AMD 64/x64. The AMD Opteron processor supports both 4Kbyte and 2Mbyte page sizes:

pagesize -a

4096

2097152

x86. The implementation of Solaris on x86 processors provides support for 4Kbyte pages only.

 

This post will be followed up by a discussion about DISM, the differences with ISM and a word of caution about using DISM on Solaris 8:
Oracle ISM and DISM: more than a no paging scheme…but be careful with Solaris 8 (2/2)

May 1, 2007

Two useful hidden parameters: _smm_max_size and _pga_max_size.

Filed under: Oracle — christianbilien @ 8:07 pm

You may have heard of the great “Burleson” vs “Lewis” controversy about the spfile/init hidden pga parameters. You can have a glimpse at the battlefield on http://www.jlcomp.demon.co.uk/untested.html (Jonathan Lewis side) and to be fair on the opposite side http://www.dba-oracle.com/art_so_undocumented_pga_parameters.htm. If you have the courage to fight your way in the intricacies of the arguments, and also taking into account that 1) this war only seems to apply to 9i 2) the above url (at least the one from Don Burleson) may have been rewritten since comments were made by Jonathan Lewis, you may be left with a sense of misunderstanding as I was when I went through it.

1) Is it worth fiddling with those undocumented parameters ?

The answer is yes (at least for me). Not that I am a great fan of setting those parameters for fun in production, it is just that I encountered thrice in a year a good reason for setting them.

2) How does it work ?

I tried to understand from the various Metalink and searches across the web what their meaning was, and I then verified the values hoping not to miss something.

I’ll avoid a discussion over parallel operations, as the settings are more complex and depends upon the degree of parallelism. I’ll spend some times investigating this but for now I’ll stick with what I tested.

1. Context

The advent of automatic pga management (if enabled!) in Oracle 9i was meant to be a relieve to the *_area_size parameters dictating how much a sort area could reach before the temp tbs would be used. Basically, the sort area sizes where acting as a threshold: your sort was performed in memory if the required sort memory was smaller than the threshold, and it went on disk if larger. The trouble with this strategy was that the sort area had to be small enough to accommodate many processes sorting at the same time, but on the other hand a large sort alone on the instance could only be using up to the sort area size before spilling on disk. The sort area sizes mutualized under the pga umbrella just removed these shortcomings. However, the Oracle designers had to cope with the possibility of a process hogging the sort memory, leaving no space for others. This is why some limitations to the sort memory available to a workarea and to a single process were put in place, using a couple of hidden parameters:

_smm_max_size: Maximum workarea size for one process

_pga_max_size: Maximum PGA size for a single process

The sorts will go on disk if any of those two thresholds are crossed.

2. Default values

9i (and probably 10gR1, which I did not test):

_pga_max_size: default value is 200MB.

_smm_max_size : default value is the least of 5% of pga_aggregate_target and of 50% of _pga_max_size. A ceiling of 100MB also applies. The ceiling is hit when the pga_aggregate_target exceeds 2GB (5% of 2GB = 10MB) or

when

_pga_max_size is set to a higher value than the default AND pga_aggregate_target is lower than 2GB.

10gR2

pga_aggregate_target now drives in most cases _smm_max_size:

pga_aggregate_target <=500MB, _smm_max_size = 20%* pga_aggregate_target

pga_aggregate_target between 500MB and 1000MB, _smm_max_size = 100MB

pga_aggregate_target >1000MB, _smm_max_size = 10%* pga_aggregate_target

and _smm_max_size in turns now drives _pga_max_size: _pga_max_size = 2 * _smm_max_size

A pga_aggregate_target larger than 1000MB will now allow much higher default thresholds in 10gR2: pga_aggregate_target set to 5GB will allow an _smm_max_size of 500MB (was 100MB before) and _pga_max_size of 1000MB (was 200MB).

You can get the hidden parameter values by querying x$ksppcv and x$ksppi as follows:

select a.ksppinm name, b.ksppstvl value from sys.x$ksppi a,sys.x$ksppcv b where a.indx = b.indx and a.ksppinm=’_smm_max_size’;

select a.ksppinm name, b.ksppstvl value from sys.x$ksppi a,sys.x$ksppcv b where a.indx = b.indx and a.ksppinm=’_pga_max_size’;

Blog at WordPress.com.