Christian Bilien’s Oracle performance and tuning blog

November 8, 2007

Can an explain plan on a SELECT self deadlock? Yes it can … but it is a bug

Filed under: Oracle — christianbilien @ 11:49 am

I had to work on a problem last week that may be of interest to some readers of this blog: the symptoms were a general slowness, and some sessions appeared to be hung. The database version is 10.2.0.2. The initial investigation showed that many sessions were waiting on the cursor: pin S wait on X’ event (from the “oracle performance and tuning” book, “a session waits for this event when it is requesting a shared mutex pin and session is holding an exclusive mutex pin on the same cursor object”).

A look at Metalink pointed at a Bug.5485914: MUTEX REPORTED SELF DEADLOCK or a variant of it as several related bugs showing similar symptoms exist.

I got a blocker (oper EXCL) and several waiters (oper GET_SHRD) on the same idn (this is the hash value of the cursor) from a systemstate dump .

 

grep Mutex mysid1_ora_19352.trc | grep EXCL
Mutex 48b914ac8(2811, 0) idn d376b3ad oper EXCL
[…]

grep Mutex mysid1_ora_19352.trc | grep GET_SHRD
Mutex 48b914ac8(2811, 0) idn d376b3ad oper GET_SHRD
Mutex 48b914ac8(2811, 0) idn d376b3ad oper GET_SHRD
Mutex 48b914ac8(2811, 0) idn d376b3ad oper GET_SHRD
Mutex 48b914ac8(2811, 0) idn d376b3ad oper GET_SHRD
[…]

 

<the number of lines which contained “GET_SHRD” were the same as the number of sessions which waited on ‘cursor: pin S wait on X’>.
Let’s expand the systemstate dump to see if we can get some information. The raw trace shows:

Blocker:

 

437231 kkkk-dddd-llll=0000-0001-0001 lock=N pin=X latch#=5 hpc=0002 hlc=0002
437232 lwt=512358c50[512358c50,512358c50] ltm=512358c60[512358c60,512358c60]
437233 pwt=512358c18[512358c18,512358c18] ptm=512358c28[512358c28,512358c28]
437234 ref=512358c80[4858d43b8,4858d43b8] lnd=512358c98[512358c98,512358c98]

437235 LIBRARY OBJECT: object=4ac8ff848
437236 type=CRSR flags=EXS[0001] pflags=[0000] status=VALD load=0
437237 DEPENDENCIES: count=4 size=16
437238 ACCESSES: count=3 size=16
437239 DATA BLOCKS:
437240 data# heap pointer status pins change whr
437241 —– ——– ——– ——— —- —— —
437242 0 429db5ae0 4ac8ff960 I/P/A/-/- 0 NONE 00
437243 6 48b914bc0 4156e8b08 I/P/A/-/E 0 NONE 00
437244 —————————————-
437245 KGX Atomic Operation Log 5124e0970

437246 Mutex 48b914ac8(2811, 0) idn d376b3ad oper EXCL
437247 Cursor Pin uid 2811 efd 0 whr 1 slp 0
437248 opr=3 pso=3fa9b9158 flg=0
437249 pcs=48b914ac8 nxt=0 flg=35 cld=0 hd=512358ba8 par=4b6560520
437250 ct=0 hsh=0 unp=0 unn=0 hvl=b65607f8 nhv=1 ses=539af3098
437251 hep=48b914b48 flg=80 ld=1 ob=4ac8ff848 ptr=4156e8b08 fex=4156e7e18

 

This a KGX mutex which protects library cache pins from 10.2.0.2 instead of the library Library Cache Pin latch.

A blocked session on the same cursor :

 

kkkk-dddd-llll=0000-0001-0001 lock=N pin=X latch#=5 hpc=0002 hlc=0002
lwt=512358c50[512358c50,512358c50] ltm=512358c60[512358c60,512358c60]
pwt=512358c18[512358c18,512358c18] ptm=512358c28[512358c28,512358c28]
ref=512358c80[4858d43b8,4858d43b8] lnd=512358c98[512358c98,512358c98]
LIBRARY OBJECT: object=4ac8ff848
type=CRSR flags=EXS[0001] pflags=[0000] status=VALD load=0
DEPENDENCIES: count=4 size=16
ACCESSES: count=3 size=16
DATA BLOCKS:
data# heap pointer status pins change whr
—– ——– ——– ——— —- —— —
0 429db5ae0 4ac8ff960 I/P/A/-/- 0 NONE 00
6 48b914bc0 4156e8b08 I/P/A/-/E 0 NONE 00
—————————————-
KGX Atomic Operation Log 482acfdf0
Mutex 48b914ac8(2811, 0) idn d376b3ad oper GET_SHRD

Cursor Pin uid 2853 efd 0 whr 5 slp
opr=2 pso=3f9bcf4e8 flg=0
pcs=48b914ac8 nxt=0 flg=35 cld=0 hd=512358ba8 par=4b6560520
ct=0 hsh=0 unp=0 unn=0 hvl=b65607f8 nhv=1 ses=539af3098
hep=48b914b48 flg=80 ld=1 ob=4ac8ff848 ptr=4156e8b08 fex=4156e7e18

a little bit further down the file :

[........]
—————————————-
SO: 539b0dc78, type: 4, owner: 53a239108, flag: INIT/-/-/0x00
session) sid: 2853 trans: 52da63ae8, creator: 53a239108, flag: (41) USR/- BSY/-/-/-/KIL/-
DID: 0001-044C-00001916, short-term DID: 0000-0000-00000000
txn branch: 52b0cfb98
oct: 3, prv: 0, sql: 4bcc7a680, psql: 482bef848, user: 360/MYSIDDBMAINT
O/S info: user: apache, term: , ospid: 6782, machine: dwhsrv-db02
program: oracleEMDWH1@dwhsrv-db02 (TNS V1-V3)
application name: oracleEMDWH1@dwhsrv-db02 (TNS V1-V3), hash value=0
waiting for ‘cursor: pin S wait on X’ blocking sess=0x0 seq=30138 wait_time=0 seconds since wait started=0
idn=d376b3ad,
value=afb00000000, where|sleeps=503c72f15
Dumping Session Wait History
for ‘cursor: pin S wait on X’ count=1 wait_time=19214
idn=d376b3ad
, value=afb00000000, where|sleeps=503c72f02
for ‘cursor: pin S wait on X’ count=1 wait_time=9996
idn=d376b3ad
, value=afb00000000, where|sleeps=503c72ef0
for ‘cursor: pin S wait on X’ count=1 wait_time=19143
idn=d376b3ad
, value=afb00000000, where|sleeps=503c72edf
[…]

 

Access to object 4ac8ff848 is obviously causing trouble. One easy way to get a more user friendly reading of the systemstate dump is to download LTOM from note id 352363.1 in metalink to get ass109.awk which is found in the src directory.

It does not seem to work with awk:

awk -f ass109.awk mysid1_ora_19352.trc awk: syntax error near line 113 awk: bailing out near line 113
but it does with nawk:
nawk -f ass109.awk mysid1_ora_19352.trc >ass109.out
 From ass109.out:

 

Blockers
[…] Below is a summary of the waited upon resources, together with the holder of that resource.
Notes:

~~~~~
o A process id of ‘???’ implies that the holder was not found in the systemstate.
Resource Holder State
LOCK: handle=4f7be3a98 1020: last wait for ‘library cache lock’
Enqueue TT-00000006-00000000 ??? Blocker
Object Names
~~~~~~~~~~~~
LOCK: handle=4f7be3a98 TABL:HOLLAND.EQUITY
Enqueue TT-00000006-00000000
151:waiting for ‘library cache lock’ [LOCK: handle=4f7be3a98] wait
791:waiting for ‘library cache lock’ [LOCK: handle=4f7be3a98] wait
1076:waiting for ‘library cache lock’ [LOCK: handle=4f7be3a98] wait
[…]

 

As you can see above, the blocker cannot be found by ass1009.awk, but we can trace to the object HOLLAND.EQUITY the waiting sessions.

sqlplus> desc HOLLAND.EQUITY

indeed hung.

I traced the offending call on this table to an explain plan of a SELECT that will be duly submitted to the Oracle support. One last word: I had to kill the session that issued it both in Oracle and at the system level (kill -9). I reproduced the problem and tried the other way round (killing the process first), but the session also had to be killed within Oracle.

I googled for my problem which seems to fall in the same category as http://momendba.blogspot.com/2007/06/cursor-pin-s-wait-on-x-wait-event-in.html . Jonathan Lewis also picked it up http://jonathanlewis.wordpress.com/2007/08/20/intelligent-stats.There is however in my (reproducible) case no call to dbms_stats. As some of the related bugs refer to sql_trace, I might be closer to them.

How to get a systemstate dump (10.2.0.2)

SQL> oradebug setmypid
Statement processed.
SQL> oradebug unlimit
Statement processed.
SQL> oradebug dump systemstate 10
Statement processed.
SQL> oradebug tracefile_name

 


 

 

 

About these ads

8 Comments »

  1. When we first hit mutex hangs in 10.2.0.1, we turned them off and never looked back. Perhaps when 10.2.0.4 is finally unleashed upon us, we will take a look at it again, but trying to reproduce it is always dicey.

    Thanks for such a clear, detailed write-up.

    Comment by Charles Schultz — November 14, 2007 @ 7:37 pm

  2. As a matter of curiosity, do you remember the circumstances in which your instance hanged ?

    Thanks

    Christian

    Comment by christianbilien — November 14, 2007 @ 7:41 pm

  3. Our SR (5374131.993) is over a year old, hence I am not able to retrieve it easily via metalink. I will ask our Sale/Tech Consultant for details. I fear that we did not capture quite the detail that you did merely because we were flying blind and clueless at the time. =) Whatever information I can find, I will pass along.

    Comment by Charles Schultz — November 15, 2007 @ 1:53 pm

  4. Thanks

    Christian

    Comment by christianbilien — November 15, 2007 @ 2:33 pm

  5. Apparently, we hit bug 5485914. We were executing DBMS_MONITOR.SESSION_TRACE_ENABLE and were unable to disable the trace and locked up the session. I can email you our SR text if you are curious, but you may get more mileage out of researching that bug (and other related bugs). To my surprise, we left the mutexes on (“_kks_use_mutex_pin” = true) and never applied the bug fix. Interesting….

    We had another case where the whole instance would inexplicably hang (waiting for the infamous “cursor: pin S wait on X”); since we were never able to reproduce it for Oracle, I do not have any further information on that particular case.

    Comment by Charles Schultz — November 15, 2007 @ 3:52 pm

  6. Sorry, I am stupid. Your whole post was about that very bug. My apologies.

    Comment by Charles Schultz — November 15, 2007 @ 3:56 pm

  7. Thanks very much for the info, it was helpful to me: it confirms that it had to do with explain plans and/or tracing. I could also reproduce the problem by using set autotrace traceonly in sqlplus.

    Christian

    Comment by christianbilien — November 15, 2007 @ 7:46 pm

  8. The mutex technology is too immature in version 10.2.0.2 …..

    Comment by maclean — June 6, 2010 @ 2:46 pm


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: