Christian Bilien’s Oracle performance and tuning blog

September 20, 2007

Is dbms_stats.set_xxx_stats insane tuning ? (part 1/2)

Filed under: Oracle — christianbilien @ 11:04 am

 

The challenge is a very common one: changing the plan of SQL calls

– generated by a vendor package (i.e. you cannot have the code easily changed – so no hints can be used -)
and
– the SQL calls cannot be syntactically predicted either because the calls are assembled according to user input criterions (i.e. no outlines) or there might be a set of calls too numerous to handle one by one
and
– I do not want to alter the initialization parameters (
OPTIMIZER_ INDEX_COST_ADJ and OPTIMIZER_INDEX_CACHING most of the time) because this apply to the whole instance
and

– Dbms_sqltune / Sql Advisor (10g) do not provide the right plan or I am working on a previous version (*)

This is where the dbms_stats.set_xxx_stats comes to my mind, with the fear that one of my interlocutors starts calling me insane once I start muttering “I may have another option”.

Some background might be necessary: ever since I first read some years ago the Wolfgang Breitling papers (http://www.centrexcc.com/) about the 10053 trace and the cost algorithm, followed by a growing enthusiasm at reading Jonathan Lewis “CBO fundamentals”, I started to think that setting statistics was a valid approach provided that three requirements were met:

  • There is hardly another way of getting to an acceptable plan (not even necessarily the “best”)
  • You know what you are doing and the potential consequences on other plans
  • The customer is psychologically receptive: he or she understands what you are doing, the reasons you are doing it that way and the need to maintain the solution or even to discard it in the future.

I’ll explain in this first part the basics to clarify my point. Jump directly to the second part (once it is written) if you already know about cost calculations (or risk being bored).

On the technical side, the most common use of dbms_stats.set_xxx_stats is for me to change index statistics via the set_index_stats procedure to make an index more attractive.

I’ll start with the most basic example to make my point. Let’s consider a select statement such as:

select * from mytable where status_code=’AAAA’;

mytable is not partitioned, status_code is indexed (non-unique) and no histograms were calculated.

In this particular case, having status_code compared to a bind value or a literal does not make any difference as far as cost calculation is concerned, but it does when the predicate is >,<,>=, etc. The most annoying bit is that you will see significant changes to the cost selectivity algorithms in 9i and in each 10g version when the predicates differ from a simple equality. Another glitch is the change in calculations depending of status_code being in bounds or out of bounds (i.e. being or not being between user_tab_col_statistics.low_value and user_tab_col_statistics.high_value). “Check Wolfgang Breitling’s “A look under the hood of the CBO” and Jonathan Lewis’ book for more comprehensive researches.

So let’s get back to our simple select * from mytable where status_code=’AAAA’;

We know the I/O cost of and index-driven access path will be:

Cost=
blevel +
ceiling(leaf_blocks * effective index selectivity) +
ceiling (clustering factor * effective table selectivity)

Blevel, the height of the index is usually a small value compared to the two other factors. Blevel, leaf_blocks and clustering factor are columns of the user_indexes view.

 

The effective index selectivity and the effective table selectivity are both calculated by the optimizer.

The effective index selectivity is found in a 10053 9i trace file as IXSEL.In our simple select, the index selectivity can be obtained from user_tab_col_statistics or user_tab_columns (the Oracle documentation states that density and num_distinct from user_tab_columns are only maintained for backward compatibility).

The effective table selectivity is found as TB_SEL in the 10053 9i trace (it is IX_SEL_WITH_FILTERS in 10g). It is here with the “=” predicate:

  • the density column if an histogram is used. You’ll find that density differs from 1/NUM_DISTINCT when histograms are calculated
  • 1/ NUM_DISTINCT (of col1) if not. In this case density would report 1/NUM_DISTINCT.

The third factor is often overwhelmingly dominant (it is not for example if fast full-index scans can be used).It is essential to know which of DENSITY and NUM_DISTINCT are really used, especially in 10g where many data base are left with the default statistics calculations (with histograms).

Let’s see in 9i (9.2.0.7) how this works with cpu costing disabled:

 

 

Index access cost

set autotrace traceonly

alter session set “_optimizer_cost_model”=io; è because cpu costing is enabled for this data base

select * from mytable where status_code=’AAAA’;

Execution Plan

———————————————————-

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=20872 Card=690782 Bytes=148518130)
1 0 TABLE ACCESS (BY INDEX ROWID) OF ‘mytable’(Cost=20872 Card=690782 Bytes=148518130)
2 1 INDEX (RANGE SCAN) OF ‘mytableI000003’ (NON-UNIQUE) (Cost=1432 Card=690782)

Let’s make the computation to verify the cost of 20872:

Blevel =2
Leaf blocks=8576
Clustering Factor = 116639
Here, index Selectivity = table selectivity = density= 1/num distinct = 1/6 = 0.166667

leaf_blocks * effective index selectivity = 1429.33
clustering factor * table selectivity = 19439.33

Cost=blevel + ceiling(leaf_blocks * effective index selectivity) + ceiling (clustering factor * effective table selectivity) = 20872

As expected, clustering factor * table selectivity is by far the dominant factor.

 

Table access cost

The basic formula is derived from “number of blocks below the high water mark” divided by db_file_multiblock_read_count. This formula has to be amended: Oracle actually uses a divisor (an “adjusted dbf_mbrc” as Jonathan Lewis calls it) which is actually a function of both db_file_multiblock_read_count and the block size. W. Breitling inferred in “A look under the hood of the CBO” how the divisor would vary as a function of db_file_multiblock_read_count for a fixed block size.

I used a 16k block size and db_file_multiblock_read_count =16. I derived the adjusted dbf_mbrc by divising the number of blocks below the HWM (46398) by the IO_COST:

alter session set optimizer_index_cost_adj=100;
alter session set “_optimizer_cost_model”=io;
select /*+ full(d) */ * from mytable d where
status_code=’AAAA’;

Execution Plan

———————————————————-

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=7044 Card=544275 Bytes=111032100)

1 0 TABLE ACCESS (FULL) OF ‘MYTABLE’ (Cost=7044 Card=544275 Bytes=111032100)

The adjusted dbf_mbrc = 46398/7044= 6.59 (remember that db_file_multiblock_read_count=16 !)

Interestingly, adjusted dbf_mbrc = 4.17 when db_file_multiblock_read_count=8. The adjustment is much less “intrusive”.

A final word: the adjusted dbf_mbrc is about the cost calculation thing not about the actual I/O size.

 

 

 

(*) Jonathan Lewis experienced in the “CBO fundamentals” book a possible tweak in profiles using the OPT_ESTIMATE hint, with a warning of using an undocumented, subject to change feature. So I’ll leave this outside this discussion. Dbms_stats.set_xxx_stats is on the other hand a documented and supported option.

 

 

3 Comments »

  1. […] is the second post about using dbms_stats.set_xxx_stats (and feeling guilty about it). I gave in the first post some straightforward cost calculations about an oversimplified (yet real) example of an index scan […]

    Pingback by Is dbms_stats.set_xxx_stats insane tuning ? (part 2/2) « Christian Bilien’s Oracle performance and tuning blog — September 28, 2007 @ 1:37 pm

  2. J’ai eu besoin de comparer des cost d’index et je suis tombé sur ton site.

    –> pas mal.

    Par contre tu n’as pas mis de test case pour le calcul de effective index selectivity dans le cas d’un index composite à 2/3 colonnes.

    Idem pour effective table selectivity

    Comment by Hervé Bodin — January 24, 2008 @ 11:33 am

  3. Thank you for your comment,

    Composite index selectivity does not apply as such: selectivities are combined using the probability theory.
    predicate 1 and predicate 2==>selectivity=selectivity1 * selectivity2
    predicate 1 or predicate 2 ==> selectivity=selectivity1 + selectivity 2 – selectivity1 * selectivity2

    Haven’t we met before ?

    Cheers

    Christian

    Comment by christianbilien — January 24, 2008 @ 4:32 pm


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: