A while ago, I came across an interesting case of memory starvation on a Oracle DB server running Solaris 8 that was for once not directly related to the SGA or the PGA. The problem showed up from a user perspective as temporary “hangs” that only seemed to happen at a specific time of the day. This server is dedicated to a single 10gR2 Oracle instance. Looking at the OS figures, the first things that I saw were some vmstat signs of memory pressure:
A high number of page reclaims, 200 to 500 Mb of free memory left out of 16GB and 2000 to 3000 pages/s scanned by the page scanner. Look at how memory is allocated using prtmem:
Total memory: 15614 Megabytes Kernel Memory: 1534 Megabytes Application: 12888 Megabytes Executable & libs: 110 Megabytes File Cache: 410 Megabytes Free, file cache: 250 Megabytes Free, free: 430 Megabytes
But look at the Oracle SGA and PGA:
SGA:
Total System Global Area 6442450944 bytes Fixed Size 2038520 bytes Variable Size 3489662216 bytes Database Buffers 2936012800 bytes Redo Buffers 14737408 bytesPGA:
select * from v$pgastatNAME VALUE UNIT ---------------------------------------------------------------- ---------- ------------ aggregate PGA target parameter 1048576000 bytes aggregate PGA auto target 65536000 bytes global memory bound 2258944 bytes total PGA inuse 1181100032 bytes total PGA allocated 2555433984 bytes maximum PGA allocated 2838683648 bytes total freeable PGA memory 755367936 bytes process count 1943 max processes count 2273 PGA memory freed back to OS 2.5918E+11 bytes total PGA used for auto workareas 9071616 bytes
Well, that’s at this point 7.5GB (again out of 16GB) for the PGA currently in use + the SGA allocation. I assumed here that because of the memory pressure, the unused part of the PGA was already paged out. Prtmem showed an “application” memory size of 12.9GB.
Where are the 5.4 GB gone ?
I looked at the structure of one of the processes using pmap –x:
pmap -x 14816
Address Kbytes Resident Shared Private Permissions Mapped File
0000000100000000 100536 44600 44600 - read/exec oracle
000000010632C000 816 560 368 192 read/write/exec oracle
00000001063F8000 912 904 - 904 read/write/exec [ heap ]
0000000380000000 16384 16384 16384 - read/write/exec/shared [ ism shmid=0x5004 ]
00000003C0000000 3145728 3145728 3145728 - read/write/exec/shared [ ism shmid=0x2005 ]
0000000480000000 3129360 3129360 3129360 - read/write/exec/shared [ ism shmid=0x11007 ]
FFFFFFFF7B270000 128 48 - 48 read/write [ anon ]
FFFFFFFF7B300000 64 64 - 64 read/write [ anon ]
FFFFFFFF7B310000 448 328 - 328 read/write [ anon ]
FFFFFFFF7B400000 8 8 8 - read/write/exec/shared [ anon ]
FFFFFFFF7B500000 8 8 - 8 read/write/exec [ anon ]
FFFFFFFF7B600000 16 16 16 - read/exec libmp.so.2
FFFFFFFF7B704000 8 8 - 8 read/write/exec libmp.so.2
FFFFFFFF7B800000 8 8 - 8 read/write/exec [ anon ]
FFFFFFFF7B900000 216 216 216 - read/exec libm.so.1
FFFFFFFF7BA34000 16 16 - 16 read/write/exec libm.so.1
FFFFFFFF7BB00000 24 24 24 - read/exec librt.so.1
FFFFFFFF7BC06000 8 8 - 8 read/write/exec librt.so.1
FFFFFFFF7BD00000 32 32 32 - read/exec libaio.so.1
FFFFFFFF7BE08000 8 8 - 8 read/write/exec libaio.so.1
FFFFFFFF7BF00000 728 728 728 - read/exec libc.so.1
FFFFFFFF7C0B6000 56 56 - 56 read/write/exec libc.so.1
FFFFFFFF7C0C4000 8 8 - 8 read/write/exec libc.so.1
FFFFFFFF7C100000 8 8 - 8 read/write/exec [ anon ]
FFFFFFFF7C200000 8 8 8 - read/exec libsched.so.1
FFFFFFFF7C302000 8 8 - 8 read/write/exec libsched.so.1
FFFFFFFF7C400000 8 8 - 8 read/write/exec libdl.so.1
FFFFFFFF7C500000 32 24 24 - read/exec libgen.so.1
FFFFFFFF7C608000 8 8 - 8 read/write/exec libgen.so.1
FFFFFFFF7C700000 56 56 56 - read/exec libsocket.so.1
FFFFFFFF7C80E000 16 16 - 16 read/write/exec libsocket.so.1
FFFFFFFF7C900000 672 672 672 - read/exec libnsl.so.1
FFFFFFFF7CAA8000 64 64 - 64 read/write/exec libnsl.so.1
FFFFFFFF7CAB8000 32 32 - 32 read/write/exec libnsl.so.1
FFFFFFFF7CB00000 8 8 - 8 read/write/exec [ anon ]
FFFFFFFF7CC00000 8 8 8 - read/exec libkstat.so.1
FFFFFFFF7CD02000 8 8 - 8 read/write/exec libkstat.so.1
FFFFFFFF7CE00000 2176 376 376 - read/exec libnnz10.so
FFFFFFFF7D11E000 240 232 16 216 read/write/exec libnnz10.so
FFFFFFFF7D15A000 8 - - - read/write/exec libnnz10.so
FFFFFFFF7D200000 72 72 72 - read/exec libdbcfg10.so
FFFFFFFF7D310000 8 8 - 8 read/write/exec libdbcfg10.so
FFFFFFFF7D400000 1056 112 112 - read/exec libclsra10.so
FFFFFFFF7D606000 48 32 - 32 read/write/exec libclsra10.so
FFFFFFFF7D612000 8 - - - read/write/exec libclsra10.so
FFFFFFFF7D700000 8 8 - 8 read/write/exec [ anon ]
FFFFFFFF7D800000 9256 3272 3272 - read/exec libjox10.so
FFFFFFFF7E208000 560 472 8 464 read/write/exec libjox10.so
FFFFFFFF7E300000 1056 208 208 - read/exec libocrutl10.so
FFFFFFFF7E506000 56 48 16 32 read/write/exec libocrutl10.so
FFFFFFFF7E514000 8 - - - read/write/exec libocrutl10.so
FFFFFFFF7E600000 1256 136 136 - read/exec libocrb10.so
FFFFFFFF7E838000 64 56 - 56 read/write/exec libocrb10.so
FFFFFFFF7E848000 8 - - - read/write/exec libocrb10.so
FFFFFFFF7E900000 1368 536 536 - read/exec libocr10.so
FFFFFFFF7EB54000 72 56 - 56 read/write/exec libocr10.so
FFFFFFFF7EC00000 8 8 - 8 read/write/exec [ anon ]
FFFFFFFF7ED00000 8 8 8 - read/exec libskgxn2.so
FFFFFFFF7EE00000 8 8 - 8 read/write/exec libskgxn2.so
FFFFFFFF7EF00000 1736 1088 1088 - read/exec libhasgen10.so
FFFFFFFF7F1B0000 72 64 - 64 read/write/exec libhasgen10.so
FFFFFFFF7F1C2000 8 8 - 8 read/write/exec libhasgen10.so
FFFFFFFF7F200000 128 128 128 - read/exec libskgxp10.so
FFFFFFFF7F31E000 16 16 - 16 read/write/exec libskgxp10.so
FFFFFFFF7F400000 8 8 8 - read/exec libc_psr.so.1
FFFFFFFF7F500000 8 8 - 8 read/write/exec [ anon ]
FFFFFFFF7F600000 176 176 176 - read/exec ld.so.1
FFFFFFFF7F72C000 16 16 - 16 read/write/exec ld.so.1
FFFFFFFF7FFE0000 128 128 - 128 read/write [ stack ]
---------------- ------ ------ ------ ------
total Kb 6416104 6347336 6344392 2944
Look at the resident size of the private section of the segments: the total private size is about 2.9MB. The largest private chunk is the heap, but it is only 1/3rd of the total private space. The remaining part of the private area resident in physical memory is made of anon segments and of private data sections.
_use_real_free_heap=true (the default in Oracle 10), meaning different heaps are used for the process portion of the PGA plus the CGA (call global area) and the UGA and possibly of other components. _use_ism_for_pga is also set to its default value (false), meaning the PGA is indeed part of the heap, not allocated from an ISM segment.
This is where it gets interesting: the number of oracle user processes at a given point at this time of the day is around 2000. I plotted for 100 randomly selected oracle processes the private size occupied by each of the processes: they all had a private memory of 3MB +/- 10%, and it was unlikely that any of the processes would significantly allocate more memory than the others. 3MB of private memory/process x 2000= 6GB: that’s about 4.9GB of “non PGA” private space (PGA in use=1.1GB).This example highlights the fact that UGA, CGAs and other portions of private memory, although seldom accounted for sizing the memory are not always negligible when the data base hosts many connexions.
[...] The post is here: Where has all my memory gone ? [...]
Pingback by RSS blunder « Christian Bilien’s Oracle performance and tuning blog — January 7, 2008 @ 11:24 am
Hi,
I disagree with this “it was unlikely that any of the processes would significantly allocate more memory than the others.”
Yes some process consumes more . what do you see when you crosscheck a couple processes with the query below.
select p.spid proces_id,value/1024/1024 Memory_MB,s.sid FROM v$session s, v$process p, v$sesstat st
WHERE s.paddr = p.addr and st.STATISTIC#=20 and s.sid=st.sid order by 2 desc;
Comment by Tahsin — January 10, 2008 @ 9:42 am
Thank you for your disagreement. I am not sure you read the last paragraph though (I plotted 100 processes) and of course what I am saying here applies to THIS case not everyone else on the earth. Here is an excerpt of your output, which just shows the oracle uga (besides, you’ll see that there is a rather large gap between the UGA+PGA and the private process size, which was precisely my point).
PROCES_ID MEMORY_MB SID
———— ———- ———-
17108 .532676697 2618
28664 .532524109 3293
28954 .532524109 2150
3838 .525535583 2533
3906 .525382996 1320
4171 .525382996 2803
PROCES_ID MEMORY_MB SID
———— ———- ———-
4664 .525382996 2806
24800 .525382996 2870
5154 .525382996 2874
18847 .525382996 2922
28674 .525382996 2951
3670 .525382996 3009
19034 .525382996 3053
4028 .525382996 2213
Comment by christianbilien — January 10, 2008 @ 11:00 am
Hello
can you cross pmax output across pga session memory and pga session memory max values.
Below is an excerpt from my system (solaris 10, oracle 9208).
Pmap shows 29264 KB , pga session memory = 5.7360229 and pga session memory max=29.707 MB, which makes me think that for this particular session, unix did not give up the memory it has allocated.
for some other sessions, pga session memory and pmap output are pretty close . I assume for this sessions the unix has given up a portion of the memory it has allocated for the session pga.
pmap -x 5975
Address Kbytes RSS Anon Locked Mode Mapped File
0000000100000000 56064 56032 – – r-x– oracle
00000001037BE000 912 912 128 – rwx– oracle
00000001038A2000 3448 3344 2880 – rwx– [ heap ]
0000000103C00000 20480 20480 20480 – rwx– [ heap ]
0000000380000000 24322048 24322048 – 24322048 rwxsR [ ism shmid=0x300004a ]
FFFFFFFF7C1B0000 1024 1024 944 – rw–R [ anon ]
FFFFFFFF7C4B0000 256 256 232 – rw–R [ anon ]
FFFFFFFF7C4F0000 128 128 112 – rw–R [ anon ]
FFFFFFFF7C510000 1664 1664 1360 – rw–R [ anon ]
FFFFFFFF7C7B0000 384 384 344 – rw–R [ anon ]
FFFFFFFF7C810000 640 640 536 – rw–R [ anon ]
FFFFFFFF7C9B0000 192 192 168 – rw–R [ anon ]
FFFFFFFF7C9E0000 832 832 744 – rw–R [ anon ]
FFFFFFFF7CB30000 448 448 368 – rw–R [ anon ]
FFFFFFFF7CBA0000 64 64 56 – rw–R [ anon ]
FFFFFFFF7CC00000 16 16 16 – rw–R [ anon ]
FFFFFFFF7CC50000 192 192 168 – rw–R [ anon ]
FFFFFFFF7CD00000 64 32 32 – rwx– [ anon ]
FFFFFFFF7CDE8000 8 8 – – rwxs- [ anon ]
FFFFFFFF7CE00000 8 8 – – r-x– libc_psr.so.1
FFFFFFFF7CF00000 24 16 8 – rwx– [ anon ]
FFFFFFFF7D000000 8 8 8 – rwx– [ anon ]
FFFFFFFF7D100000 56 16 – – r-x– libmd.so.1
FFFFFFFF7D20E000 8 8 8 – rwx– libmd.so.1
FFFFFFFF7D300000 640 216 – – r-x– libm.so.2
FFFFFFFF7D49E000 40 24 24 – rwx– libm.so.2
FFFFFFFF7D500000 8 8 – – r-x– libkstat.so.1
FFFFFFFF7D602000 8 8 – – rwx– libkstat.so.1
FFFFFFFF7D700000 32 32 – – r-x– librt.so.1
FFFFFFFF7D808000 8 8 8 – rwx– librt.so.1
FFFFFFFF7D900000 8 8 8 – rwx– [ anon ]
FFFFFFFF7DA00000 32 32 – – r-x– libaio.so.1
FFFFFFFF7DB08000 8 8 8 – rwx– libaio.so.1
FFFFFFFF7DC00000 936 728 – – r-x– libc.so.1
FFFFFFFF7DDEA000 64 64 48 – rwx– libc.so.1
FFFFFFFF7DDFA000 8 8 8 – rwx– libc.so.1
FFFFFFFF7DE00000 8 8 – – r-x– libdl.so.1
FFFFFFFF7DF02000 8 8 8 – rwx– libdl.so.1
FFFFFFFF7E000000 32 16 – – r-x– libgen.so.1
FFFFFFFF7E108000 8 8 8 – rwx– libgen.so.1
FFFFFFFF7E200000 56 32 – – r-x– libsocket.so.1
FFFFFFFF7E30E000 16 16 16 – rwx– libsocket.so.1
FFFFFFFF7E400000 5384 5376 – – r-x– libjox9.so
FFFFFFFF7EA40000 376 376 216 – rwx– libjox9.so
FFFFFFFF7EA9E000 16 – – – rwx– libjox9.so
FFFFFFFF7EB00000 8 8 8 – rwx– [ anon ]
FFFFFFFF7EC00000 688 264 – – r-x– libnsl.so.1
FFFFFFFF7EDAC000 64 64 48 – rwx– libnsl.so.1
FFFFFFFF7EDBC000 32 32 24 – rwx– libnsl.so.1
FFFFFFFF7EE00000 32 24 – – r-x– libskgxn9.so
FFFFFFFF7EF06000 8 8 8 – rwx– libskgxn9.so
FFFFFFFF7F000000 8 8 8 – rwx– [ anon ]
FFFFFFFF7F100000 8 8 – – r-x– libskgxp9.so
FFFFFFFF7F200000 8 8 – – rwx– libskgxp9.so
FFFFFFFF7F300000 8 8 – – r-x– libodmd9.so
FFFFFFFF7F400000 8 8 8 – rwx– libodmd9.so
FFFFFFFF7F500000 64 64 48 – rwx– [ anon ]
FFFFFFFF7F600000 208 208 – – r-x– ld.so.1
FFFFFFFF7F734000 16 16 16 – rwx– ld.so.1
FFFFFFFF7F738000 8 8 8 – rwx– ld.so.1
FFFFFFFF7FFD8000 160 160 144 – rw— [ stack ]
—————- ———- ———- ———- ———-
total Kb 24417992 24416632 29264 24322048
select * from v$statname where statistic# in (20,21);
STATISTIC# NAME CLASS
———- —————————————————————- ———
20 session pga memory 1
21 session pga memory max 1
select p.spid proces_id,value/1024/1024 Memory_MB,s.sid FROM v$session s, v$process p, v$sesstat st
2 WHERE s.paddr = p.addr and st.STATISTIC#=20 and s.sid=st.sid and s.sid=4519;
PROCES_ID MEMORY_MB SID
———— ——— ——–
5975 5.7360229 4519
select p.spid proces_id,value/1024/1024 Memory_MB,s.sid FROM v$session s, v$process p, v$sesstat st
WHERE s.paddr = p.addr and st.STATISTIC#=21 and s.sid=st.sid and s.sid=4519;
PROCES_ID MEMORY_MB SID
———— ——— ——–
5975 29.7078094482422 4519
Comment by Tahsin — January 10, 2008 @ 2:51 pm
Christian,
what was your solution in this case? Did you limit the number of connections (probably by decreasing some connection pool size) or did you switch to shared server? Or did you pick a completely different solution?
Cheers
robert
Comment by Robert — January 14, 2008 @ 10:55 am
To Tahsin:
Yes I also initially thought that Oracle may not have released the delta between the PGA and MAX PGA. I looked for something else because from v$pgastat, the max pga allocated since instance startup was about 2.8MB (pga in use =1.1GB), so there would still be more than 3GB unaccounted for even if all the PGA one allocated had not been released. There is another reason: the processes constantly logon and logoff from this database, and I did not spot any process with a several GB RSS.
The session PGA was approximately pga in use/2000 for all processes.
Christian
Comment by christianbilien — January 14, 2008 @ 7:02 pm
To Robert,
Well, it happened that this instance was part of a 2 node RAC, so I had the flexibility to move many of them on another instance.
Christian
Comment by christianbilien — January 14, 2008 @ 7:04 pm
Hi Christian,
Consider this scenario
at time t1: process 1 uses 50M pga , all other 1999 processes use 1MB pga each
at time t2: process 2 usess 50M pga, process 1 and all other 1999 processes use 1mB pga each
at time t3: process 3 uses 50m pga, all other process including processes 1 & 2 use 1MB pga each.
blah blah..
at time t2000: process 2000 uses 50M pga and all other process including process 1..1999 use 1MB pga each
so at time t2000, I can say that my pga max allocated is 50M+1M*1999=2049M since the instance is up.However
If the unix did not free the memory for each processes (which can happen quite easily as seen from the excerpt from my earlier comment) , in which case I would have 2000 process each allocated 50M peeking from unix side with pmap command.
Comment by Tahsin — January 21, 2008 @ 4:34 pm
Hello Tahsin,
I’m sure other scenarios could be thought of, such as the one you designed. The key thing I mentionned above is the logon/logoff rate. This is an application which basically logon every time an action has to be performed because the client applications can potentially logon from all over the world from several thousands sources. This is why I believe the “memory not released” can be ruled out in this case.
Cheers
Christian
Comment by christianbilien — January 29, 2008 @ 9:33 am