This is the second post about using dbms_stats.set_xxx_stats (and feeling guilty about it). I gave in the first post some straightforward cost calculations about an oversimplified (yet real) example of an index scan and the underlying table scan as a preparatory step.
select * from mytable d where status_code=’AAAA’;
status_code is indexed by MYTABLEEI000003, no histograms were calculated. The “adjusted dbf_mbrc” (obtained by dividing the full table scan cost by the number of blocks below the high water mark is 6.59). This is all detailed in here .
Things I did not emphasize in this case :
– The number of distinct value of status_code is 6 and never varies whatever the number of lines in the table. The selectivity of status_code will always be 1/6, which is quite high (bad selectivity).
– The distribution is very unequal 90% of the rows have the same status code. Bind peeking is enabled but the select call does not necessarily goes to the low cardinality values.
– The table scan cost is now BETTER (lower) that the index scan cost.
– optimizer_index_cost_adj is set to 50 (the final index cost is index cost calculation as explained in the first post divided by 2).
The index calculation cost of mytableI000003 follows this formula:
Cost=blevel + ceiling(leaf_blocks * effective index selectivity) + ceiling (clustering factor * effective table selectivity)
which can be approximated to:
cost=(leaf blocks + clustering factor)/6
I went back to old statistics by using various copies of the database made from production to development or user acceptance environments to get a picture if what the statistics were in the past. I then plotted the index vs. the full tables scan costs as a function of the number of blocks below the high water mark:
Here is a zoom of the last two points:
The FTS cost was slightly above the index cost in the past, but it is now under it. This is unfortunate as the actual FTS execution time is here always much higher than when the index is chosen.
So what can be done ?
– Lowering optimizer_index_cost_adj: the whole instance would be impacted
– Creating an outline: I oversimplified the case by just presenting one select but there are actually many calls derived from it, but in any case the index scan is always the good option.
– Hints and code changes: this is a vendor package so the code cannot be amended
This is where the use of dbms_stats.set_column_stats or dbms_stats.set_index_stats seems to me to be a valid option. True, tweaking the statistics goes against common sense, and may be seen as insane by some. However we are in a blatant case where:
- a slight variation in the statistics calculation causes plans to go from good to (very) bad
- the modified statistics will only impacts a known perimeter
- we do not have any other satisfactory solution
A last question remains: because of the relative weights involved in the index cost calculation, it is only practicable to make the index more attractive when
Clustering Factor x selectivity
is lowered.
Of the two factors, I prefer to modify the clustering factor because the optimizer calculates the selectivity either from the density or from num_distinct depending upon the histograms presence. One last word: the clustering factor you set will of course be wiped out the next time statistics are calculated on this index.
Modify the index clustering factor :
begin
dbms_stats.set_index_stats(
ownname => ‘MYOWNER’,
indname => ‘MYTABLEI000003’,
clstfct => 100000);
end;
/
Modify the column density:
begin
dbms_stats.set_column_stats(
ownname => ‘MYOWNER’,
tabname => ‘MYTABLE,
colname => ‘STATUS_CODE’,
density => 0.0001);
end;
/