Christian Bilien’s Oracle performance and tuning blog

December 9, 2007

One of Pavlov’s dogs (1/2)

Filed under: Oracle,RAC,Solaris — christianbilien @ 7:44 pm

That’s what I thought of myself after spending an hour trying to figure out what I had done wrong in my setup.

So here is a little challenge I derived from this experience:

1. I set up a 5 node 10gR2 RAC over a mix of 5 Xeon and Opteron 4 cores. The interconnect is RDS over an HCA Infiniband. The setup was made on OpenSolaris. I know, RAC is not a supported on OpenSolaris but we were advised by Sun MicroSystems that the Infiniband layer was significantly faster on OpenSolaris that it was on Solaris 10. Some ftp tests indeed showed that even IPoIB was 20% faster on OpenSolaris than on Solaris 10. So I had to tweak the Oracle installer and could not use any of the Oracle Guis, but I got it working.

2. select /*+ full(c) full(o) */ * from soe.product_descriptions c,soe.product_information o where c.product_id=o.product_id and c.product_id>300000;

was run simultaneously 10 times on each instance (this is incidentally a data base generated by Swingbench ). The DOP varies from 10 to 40, but it does not make much difference as far as our challenge is concerned.

I then plotted the ‘PX remote messages recv’d’ and ‘PX remote messages sent’.

Parallel_execution_message_size=8192

I could get the following peak throughput (5s interval) by multiplying parallel_execution_message_size by the PX remote messages figures:

PX remote messages recv’d Mb/s PX remote messages sent Mb/s

792,82

1077,22

1035,78

756,21

697,01

885,18

I am not taking into account the GES and the GCS messages, nor did I count the cache fusion blocks. Both of them were small anyway. The weird thing came when I tried to measure the corresponding HCA Input / Output packet number from netstat:

Input Packets/s Output Packets/s

69

73

39

35

58

6

Almost no traffic on the interconnect (the HCA MTU is 2044 bytes) !

Let’s check that the interconnect is running over the intended HCA:

netstat -ni

Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs CollisQueue
lo0 8232 127.0.0.0 127.0.0.1 32364353 0 32364353 0 0
ibd0 2044 10.0.0.0 10.0.0.1 27588368 0 26782527 0 0
nge1 1500 10.x.x.x 10.x.x.x 640710585 0 363462595 0 0

SQL> oradebug setmypid
SQL> oradebug ipc
SQL> oradebug tracefile_name

[…]
SSKGXPT 0x64362c8 flags SSKGXPT_READPENDING socket no 8 IP 10.0.0.1
[…]

Just to be sure:

SQL> select indx,picked_ksxpia,ip_ksxpia from x$ksxpia;
INDX PICKED_KSXPIA IP_KSXPIA
————————————————————-
0 CI 10.0.0.1

Question: What could possibly justify the discrepancies between the netstat output and the values collected from v$px_sesstat ?

Hints: it is not a bug

 

1 Comment »

  1. […] under: Oracle, Solaris — christianbilien @ 9:15 pm I didn’t get much chance with the Pavlov’s dogs challenge : not even a single attempt at an explanation. Maybe it was too weird a problem (or more […]

    Pingback by One of Pavlov’s dogs (2/2) « Christian Bilien’s Oracle performance and tuning blog — December 13, 2007 @ 9:15 pm


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: