This post only deals with ISM. I’ll write second one about Dynamic ISM (DISM) .
A long standing problem on any platform has been the probability that part of the Oracle memory segment gets swapped out and that what is a relatively memory fast access turns into a horrid bottleneck. Oracle 9i on Solaris made use of an interesting feature named Intimate Shared Memory (ISM) which in fact makes a lot more than what one may think of initially.
The very first benefit of ISM (not DISM for the time being) is that the shared memory is locked by the kernel when the segment is created: the memory cannot be paged out. A small price to pay to the locking mechanism is that sufficient available unlocked memory must exist for the allocation to succeed.
Because the SHM_SHARE_MMU flag is set in the shmat system call to set up the shared segment as ISM, there are less known benefits, which may be of a higher importance than the no paging scheme on CPU bounds systems.
Shared kernel virtual-to-physical translation
The virtual to physical mapping is one of the most consuming tasks any modern operating system has to perform. The hardware Translation Lookaside buffer (TLB) is a physical cache to the slower in-memory tables. The Translation Storage Buffer (TSB) is a further translation in memory cache. As even in Solaris 10 the standard System V algorithm is still to have a private virtual address space for each process, aliasing (several virtual addresses exist that map to the same physical address).
ISM allows the sharing of kernel virtual-to-physical memory between processes that attach to the shared memory, saving considerable translation slots in the hardware TLB. This can be monitored on Solaris 10 by trapstat:
# trapstat -T
cpu m size| itlb-miss %tim itsb-miss %tim | dtlb-miss %tim dtsb-miss %tim |%tim
———-+——————————-+——————————-+—-
512 u 8k| 1761 0.1 2841 0.2 | 2594 0.1 2648 0.2 | 0.5
512 u 64k| 0 0.0 0 0.0 | 8 0.0 0 0.0 | 0.0
512 u 512k| 0 0.0 0 0.0 | 0 0.0 0 0.0 | 0.0
512 u 4m| 20 0.0 1 0.0 | 4 0.0 0 0.0 | 0.0
512 u 32m| 0 0.0 0 0.0 | 11 0.0 0 0.0 | 0.0
512 u 256m| 0 0.0 0 0.0 | 0 0.0 0 0.0 | 0.0
trapstat show both instruction and data misses in both the TLB and the TSB.
Solaris 8 does not have trapstat, so the trick is to use cpustat:
On a non-idle Oracle system using ISM as seen below,
mpstat 5 5
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 282 728 547 1842 283 329 62 10 3257 40 8 25 27
1 0 0 122 227 2 1954 284 327 55 9 3639 39 6 29 26
2 0 0 257 1578 1399 1887 288 330 58 9 3287 35 11 27 27
3 1 0 313 1758 1501 1933 285 328 70 12 3437 36 8 29 27
cpustat -c pic0=Cycle_cnt,pic1=DTLB_miss 1
time cpu event pic0 pic1
1.010 3 tick 192523799 29658
1.010 2 tick 270995815 28499
1.010 0 tick 225156772 29621
1.010 1 tick 234603152 29034
psrinfo –v
Status of processor 3 as of: 05/14/07 12:48:53
Processor has been on-line since 03/11/07 10:35:22.
The sparcv9 processor operates at 1062 MHz,
and has a sparcv9 floating point processor.
cpustat shows that on processor 3, we have 29658 dTLB misses on this sample. UltraSparcIII will use somewhere between 50 cycles (most favourable case: no TLB entry miss) and 300 cycles (worst case: a memory load has to be performed to compute the translation) to handle dTLB accesses. It will take in the best scenario 1.5 million cycles per seconds and 8.9 millions in the worst to handle the misses. At 1062Mhz, the time spent handling dTLB misses is only between 0.14% and 0.84% !
Large pages.
From Solaris 2.6 through Solaris 8, large pages are only available through the use of the ISM (using SHM_SHARE_MMU).
Solaris 8
pagesize
8192
Solaris 10 :default pagesize
pagesize
8192
Supported page sizes:
pagesize -a
8192
65536
524288
4194304
ISM page size on Solaris 10 (look at the pgsz column). It looks like Oracle is using the largest page available
pmap -sx 25921
25921: oracleSID1 (LOCAL=NO)
Address Kbytes RSS Anon Locked Pgsz Mode Mapped File
00000001064D2000 24 24 24 – 8K rwx– [ heap ]
00000001064D8000 32 8 – – – rwx– [ heap ]
0000000380000000 1048576 1048576 – 1048576 4M rwxsR [ ism shmid=0x6f000078 ]
AMD 64/x64. The AMD Opteron processor supports both 4Kbyte and 2Mbyte page sizes:
pagesize -a
4096
2097152
x86. The implementation of Solaris on x86 processors provides support for 4Kbyte pages only.
This post will be followed up by a discussion about DISM, the differences with ISM and a word of caution about using DISM on Solaris 8:
Oracle ISM and DISM: more than a no paging scheme…but be careful with Solaris 8 (2/2)