Christian Bilien’s Oracle performance and tuning blog

September 12, 2007

Strategies for RAC inter-instance parallelized queries (part 1/2)

Filed under: Oracle,RAC — christianbilien @ 8:32 pm

I recently had to sit down and think about how I would spread workloads across nodes in a RAC data warehouse configuration. The challenge was quite interesting, here were the specifications:

  • Being able to fire PQ slaves on different instances in order to use all available resources for most of the aggregation queries.
  • Based on a tnsname connection, being able to restrict PQ access to one node because external tables may be used. Slaves running on the wrong node would fail :

insert /*+ append parallel(my_table,4) */ into my_table
ERROR at line 1:
ORA-12801: error signaled in parallel query server P001, instance node-db04:INST2 (2)
ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-29400: data cartridge error
KUP-04040: file myfile408.12.myfile_P22.20070907.0.txt in MYDB_INBOUND_DIR not found
ORA-06512: at “SYS.ORACLE_LOADER”, line 19

I could come up with two basic configurations based on combinations of services, INSTANCE_GROUPS and PARALLEL_INSTANCE_GROUP. This post is about an asymmetric configuration, the second will deal with a symmetric configuration.

Parallel executions are services aware: the slaves inherit the data base service from the coordinator. But the coordinator may execute slaves on any instance of the cluster, which means that the service localization (‘PREFERRED’ instance) will create the coordinator on the henceforth designed instance, but the slaves are not bound by the service configuration. So I rewrote “being able to restrict PQ access to one node” into “being able to restrict PQ access and the coordinator to one node”

An instance belongs to the instance group it declares in its init.ora/spfile.ora. Each instance may have several instance groups declared: INSTANCE_GROUPS =’IG1’,’IG2’ (be careful not to set ‘IG1,IG2’). PARALLEL_INSTANCE_GROUP is another parameter which can be either configured at the system or at session level. Sessions may fire PQ slaves on instances for which the parallel instance group matches and instance group the instance belongs to.

Let’s consider a 2 nodes configuration: node1 and node2. I1 and I2 are instances respectively running on node1 and node2.
The Oracle Data Warehouse documentation guide states that a SELECT statement can be parallelized for objects schema created with a PARALLEL declaration only if the query involves either a full table table scan or an inter partition index range scan. I’ll force a full table scan in my tests case to have the CBO decide to pick up a parallel plan:

select /*+ full(orders_part)*/ count(*) from orders_part;

The configuration described below will allow users :

  • connected to node1 to execute both the coordinator and slaves on node1, and prevent the slaves to spill on node2
  • connected to node2 but unable to do an alter session (because the code belongs to an external provider) to load balance their queries across the nodes
  • connected to node 1 and able to issue an alter session to load balance.

In a nutshell, users connecting to node1 will restrict their slave scope to node1, users connecting to node2 will be allowed to load balance their slaves over all the nodes.

I set for the test parallel_min_servers=0: I can then see the slaves starting whenever Oracle decides to fire them.

Asymmetric INSTANCE_GROUPS configuration:

On node1, spfileI1.ora looks like:

INSTANCE_GROUPS =’IG1’,’IG2’
PARALLEL_INSTANCE_GROUP=’IG1’

On node2, spfileI2.ora contains:

INSTANCE_GROUPS =’IG2’
PARALLEL_INSTANCE_GROUP=’IG2’

select inst_id, name, value from gv$parameter where name like ‘%instance%group%’;

INST_ID NAME VALUE

———- ——————————————————————————-

1 instance_groups IG1, IG2
1 parallel_instance_group IG1
2 instance_groups IG2
2 parallel_instance_group IG2

Single node access

I configured via dbca nodetaf1, a service for which node1/instance I1 was the preferred node and node was set to “not use” (I do not want to connect on one node an execute on another – thereby unnecessarily clogging the interconnect —) and did the same for node2/instance I2.

SQL> select inst_id,name from gv$active_services where name like ‘%taf%’;

INST_ID NAME

———- —————————————————————-

2 nodetaf2
1 nodetaf1

The related TNSNAMES entries for a node1 only access looks like:

Node1-only =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = node1)(PORT = 1521))

)

(LOAD_BALANCING=NO)
(CONNECT_DATA =
(SERVICE_NAME= nodetaf1)
(SID = EMDWH1)
(SERVER = DEDICATED)

)
)

Note that the host name is not the vip address (because there is no point in switching node should node1 fail).

Test results:

$ sqlplus mylogin/mypass@node1-only

< no alter session set parallel_instance_group= >
select /*+ full(orders_part) */ count(*) from orders_part;

select qcsid, p.inst_id “Inst”, p.server_group “Group”, p.server_set “Set”,s.program
from gv$px_session p,gv$session s
where s.sid=p.sid and qcsid=&1
order by qcinst_id , p.inst_id,server_group,server_set

QCSID Inst Group Set PROGRAM
———- ———- ———- ———- ————————————————

938 1 1 1 oracle@node1 (P000)
938 1 1 1 oracle@node1 (P004)
938 1 1 1 oracle@node1 (P002)
938 1 1 1 oracle@node1 (P005)
938 1 1 1 oracle@node1 (P006)
938 1 1 1 oracle@node1 (P001)
938 1 1 1 oracle@node1 (P007)
938 1 1 1 oracle@node1 (P003)
938 1 sqlplus@node1 (TNS V1-V3)

The coordinator and the slaves stay on node1

Dual node access

I then added a service aimed at firing slave executions on both nodes. The ‘bothnodestaf’ was added using dbca and then modified to give it “goal_throughput” and “clb_goal_short” load balancing advisories: according to the documentation, load balancing is based on rate that works is completed in service plus available bandwidth. I’ll dig into that one day to get a better understanding of the LB available strategies.

execute dbms_service.modify_service (service_name => ‘bothnodestaf’ –
, aq_ha_notifications => true –
, goal => dbms_service.goal_throughput –
, failover_method => dbms_service.failover_method_basic –
, failover_type => dbms_service.failover_type_select –
, failover_retries => 180 –
, failover_delay => 5 –
, clb_goal => dbms_service.clb_goal_short);

In the tnsnames.ora:

bothnodes =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = myvip1)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = myvip2)(PORT = 1521))

)

(CONNECT_DATA =
(SERVICE_NAME = bothnodestaf)
(FAILOVER_MODE =
(TYPE = SELECT)
(METHOD = BASIC)
(RETRIES = 180)
(DELAY = 5)

)
(SERVER = DEDICATED)
)
)

Note that LOAD_BALANCE is unset (set to OFF) in the tnsnames.ora to allow server-side connection balancing

On node1:

$ sqlplus mylogin/mypass@bothnodes

alter session set parallel_instance_group=’IG2′;
select /*+ full(orders_part) */ count(*) from orders_part;

QCSID Inst Group Set PROGRAM

———- ———- ———- ———- ————————————————

936 1 1 1 oracle@node1 (P002)
936 1 1 1 oracle@node1 (P003)
936 1 1 1 oracle@node1 (P001)
936 1 1 1 oracle@node1 (P000)
936 1 1 1 oracle@node2 (P037)
936 1 1 1 oracle@node2 (P027)
936 2 1 1 oracle@node1 (O001)
936 2 1 1 oracle@node2 (P016)
936 2 1 1 oracle@node2 (P019)
936 2 1 1 oracle@node2 (P017)
936 2 1 1 oracle@node2 (P018)
936 1 sqlplus@node1 (TNS V1-V3)

The slaves started on both nodes.

On node2:

$ sqlplus mylogin/mypass@bothnodes
< no alter session set parallel_instance_group= >
select /*+ full(orders_part) */ count(*) from orders_part;

QCSID Inst Group Set PROGRAM

———- ——————————————————————————-

952 1 1 1 oracle@node1 (P002)
952 1 1 1 oracle@node1 (P003)
952 1 1 1 oracle@node1 (P001)
952 1 1 1 oracle@node1 (P000)
952 1 1 1 oracle@node2 (P037)
952 1 1 1 oracle@node2 (P027)
952 2 1 1 oracle@node1 (O001)
952 2 1 1 oracle@node2 (P016)
952 2 1 1 oracle@node2 (P019)
952 2 1 1 oracle@node2 (P017)
952 2 1 1 oracle@node2 (P018)
952 1 sqlplus@node1 (TNS V1-V3)

Again, connecting on a node2 services allows load balancing between the nodes

 

 

4 Comments »

  1. […] @ 5:52 am If you have not done so, it would be beneficial to first go through the first post “Strategies for parallelized queries across RAC instances” to get an understanding of the concepts and challenge. I came up with a description of an […]

    Pingback by Strategies for parallelized queries across RAC instances (part 2/2) « Christian Bilien’s Oracle performance and tuning blog — September 14, 2007 @ 5:52 am

  2. Very Useful article

    Comment by Mayvins — April 11, 2008 @ 11:19 am

  3. Helpful article keep it up.

    Comment by Anonymous — June 5, 2010 @ 6:29 pm

  4. Super article,thanks a lot.

    Comment by Fahd Mirza — May 18, 2011 @ 3:29 pm


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: