Christian Bilien’s Oracle performance and tuning blog

December 13, 2007

One of Pavlov’s dogs (2/2)

Filed under: Oracle,RAC,Solaris — christianbilien @ 9:15 pm

I didn’t get much chance with the Pavlov’s dogs challenge : not even a single attempt at an explanation. Maybe it was too weird a problem (or more embarrassing it did not interest anyone! ).

A quick reminder about the challenge: netstat shows almost no activity on an otherwise loaded interconnect (500Mb/s to 1Gb/s inbound and similar values going outbound as seen on the Infiniband switches and calculated as the product of PX remote messages recv’d/send x parallel_execution_message_size).

Well, anyway, here is what I think is the answer: the key information I gave was that the clusterware was running RDS over Infiniband. Infiniband HCAs have an inherent advantage over standard Ethernet network interfaces: they embed RDMA, which means that all operations are handled without interrupting the CPUs. That’s because the sending nodes read and write to the receiving node using user space memory, without going through the usual I/O channel. TCP/IP NICs also cause a number of interrupts the CPUs have to process because TCP segments have to be reconstructed while other threads are running.

The most likely cause of the netstat blindness is just that it cannot see the packets because the CPUs are unaware of them.

To quote the Wikipedia Pavlov’s dogs article, “the phrase “Pavlov’s dog” is often used to describe someone who merely reacts to a situation rather than use critical thinking”. That’s exactly what I thought of myself when I was trying to put the blame on the setup instead of thinking twice about the “obvious” way of measuring a network packet throughput.

3 Comments »

  1. The Infiniband protocol is still pretty exotic and RDS is even more so. That’s probably why you did not get many comments.

    I am not sure I agree with your comment about “netstat blindness”. There is no inherent reason why netstat could not have been modified to output Infiniband traffic info. Infiniband RDS performance counters are readily available for example in Linux and could have been integrated into netstat:

    # iba_proc_read /proc/driver/rds/stats
    Rds Statistics:
    Sockets open: 69
    End Nodes connected: 1
    Performance Counters: ON
    Transmit:
    Xmit bytes 6321933103
    Xmit packets 3219991
    Xmit errors 0

    Comment by Val Carey — December 14, 2007 @ 3:19 am

  2. Val,

    I certainly agree with you that netstat could have been changed to integrate the RDS/IB statistics as I can see them in kstat. My thoughts (confirmed by a Solaris SE) came from the fact that I could see the the IPoIB traffic but not the RDS one. As I understand, SDP (sockets), RDS or SRP (SCSI) must be used with IB for RDMA to be enabled. If not (IPoIB for example), the CPU will be invoked – and netstat shows the statistics -.

    kstat -m rds
    module: rds instance: 0
    name: rds_kstat class: misc
    crtime 103.978122548
    rds_enobufs 0
    rds_ewouldblocks 0
    rds_failovers 0
    rds_nports 128
    rds_nsessions 4
    rds_port_quota 140
    rds_port_quota_adjusted 0
    rds_post_recv_buf_called 6612
    rds_rx_bytes 1578871131
    rds_rx_errors 0
    rds_rx_pkts 1982786
    rds_rx_pkts_pending 0
    rds_stalls_ignored 0
    rds_stalls_recvd 0
    rds_stalls_sent 0
    rds_stalls_triggered 0
    rds_tx_acks 1722602
    rds_tx_bytes 1436400309
    rds_tx_errors 0
    rds_tx_pkts 1991663
    rds_unstalls_recvd 0
    rds_unstalls_sent 0
    rds_unstalls_triggered 9254
    snaptime 352179.085671155

    Comment by christianbilien — December 14, 2007 @ 10:22 am

  3. Christian,

    You are absolutely right that SDB or RDS can take advantage of network layer processing off-load, kernel bypass, RDMA, etc that Infiniband offers. IPoIB still can make some use of protocol off-load but not of RDMA or kernel bypass or zero-system/user space copy . Since with IPoIB the standard TCP/IP stack can be used with Infiniband treated just like a very fast Ethernet link level protocol, all the usual performance counters are apparently available without need to modify netstat at all.

    Comment by Val Carey — December 14, 2007 @ 2:44 pm


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: