Oct 8, 2009

Sample size (estimate_percent) affects correctness of database statistics

There are many arguments we can pass in dbms_stats.gather_table_stats procedure - "method_opt" and "estimate_percent" are two very important ones.

The default values (in 10g) for "method_opt" and "estimate_percent" are  AUTO - meaning that Oracle will decide if Histogram Stats or No Histogram is appropriate for columns.

Sometimes, we customizes sample size to make stats collection job faster. This customization should be perfect otherwise a low sample size could result in non-representative statistics.

For example, when I collect statistics with 5% sample size, it does not create histograms for the two following columns.


SQL> exec DBMS_STATS.GATHER_TABLE_STATS(ownname=>'TEST', tabname=>'TAB1', degree=>2, estimate_percent=>5);


SQL> SELECT column_name, num_buckets, num_distinct, sample_size, histogram
FROM user_tab_col_statistics
WHERE table_name='TAB1';


COLUMN_NAME                    NUM_BUCKETS NUM_DISTINCT SAMPLE_SIZE HISTOGRAM
------------------------------ ----------- ------------ ----------- ---------------
UPDATED                                  1      6382137      633875 NONE
CREATED                                  1      7858623      633875 NONE

VERSION                                 15           15      633875 FREQUENCY
ID                                               1     12677500      633875 NONE
TRASH_DATE                           1       294401      221149 NONE



Now, when I increased the sample size to 50%, the procedure collected Height Balanced Histograms for the two columns. See below -

SQL> exec DBMS_STATS.GATHER_TABLE_STATS(ownname=>'TEST', tabname=>'TAB1', degree=>2, estimate_percent=>50);

SQL> SELECT column_name, num_buckets, num_distinct, sample_size, histogram
FROM user_tab_col_statistics
WHERE table_name='TAB1';

COLUMN_NAME                    NUM_BUCKETS NUM_DISTINCT SAMPLE_SIZE HISTOGRAM
------------------------------ ----------- ------------ ----------- ---------------
UPDATED                                254      8929376     6347812 HEIGHT BALANCED
CREATED                                254      9793629     6345628 HEIGHT BALANCED

VERSION                                 21           21     6341908 FREQUENCY
ID                                              1     12692852     6346426 NONE
TRASH_DATE                          1      1217543     2208858 NONE


The different statistics could affect the execution plan for the queries where CREATED/UPDATED columns are used. So, these things should be carefully tested and sample size should not be underestimated. In fact, in my experience I have seen, larger sample size works better for big tables.

P.S. I planned to write a post on this probably in January this year, when I did the experiments on database statistics with histogram and no histogram options. Later became busy with other stuffs and  forgot. Few days ago, I worked again to write custom scripts for collecting database stats. The sample size problem stroked - I set the sample size 50% in my scripts - working perfectly!


***

Oct 5, 2009

Optimizer glitch for partitioned table in version 10.2.0.4.0

Recently discovered that, partitioned tables are being accessed through global index rather than partitioned index, even though there are enough reasons to access via local/partitioned index.

Initially thought, as we turned off bind variable peek for execution plan stability (hidden system parameter _optim_peek_user_binds to FALSE), optimizer is not looking at the bind values and doesn't know which partition to access. So, accessing through a global index seemed logical. But there were other reasons!

At some point, we will need to access partitions through local index which is more efficient than accessing through global index. So I did more experiments on these. In my testing, I used literals instead of bind variables - but strangely, the execution plan did not change according to my expectation and was showing old plan with global index access!

The plan looked like -

------------------------------------------------------------------------------------------------------------

| Id  | Operation                               | Name             | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |

------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                  |     5 |   750 |  1748   (2)| 00:00:02 |       |       |
|*  1 |  FILTER                                 |                  |       |       |            |          |       |       |
|   2 |   SORT ORDER BY                         |                  |     5 |   750 |  1748   (2)| 00:00:02 |       |       |
|   3 |    HASH UNIQUE                          |                  |     5 |   750 |  1747   (1)| 00:00:02 |       |       |
|   4 |     NESTED LOOPS                        |                  |     5 |   750 |  1746   (1)| 00:00:02 |       |       |
|   5 |      NESTED LOOPS                       |                  |   204 | 11424 |  1333   (1)| 00:00:02 |       |       |
|*  6 |       TABLE ACCESS BY INDEX ROWID       | TAC_LOGIN_ATOK   |    11 |   275 |    23   (0)| 00:00:01 |       |       |
|*  7 |        INDEX RANGE SCAN                 | TAC_LA_LOGIN_ID  |    74 |       |     3   (0)| 00:00:01 |      |       |
|*  8 |       TABLE ACCESS BY GLOBAL INDEX ROWID| TAC_FORM_ATOK    |    19 |   589 |   353   (1)| 00:00:01 | ROWID | ROWID |
|*  9 |        INDEX RANGE SCAN                 | TAC_FA_ATOK_I    |    38 |       |     2   (0)| 00:00:01 |       |       |
|* 10 |      TABLE ACCESS BY GLOBAL INDEX ROWID | TAC_FORM_SUMMARY |     1 |    94 |     2   (0)| 00:00:01 | ROWID | ROWID |
|* 11 |       INDEX UNIQUE SCAN                 | TAC_FS_UFID_PK   |     1 |       |     1   (0)| 00:00:01 |       |       |
-------------------------------------------------------------------------------------------------------------

Something must be wrong! Thought, it could be statistics or could be some other parameters.Rebuilt local index and recollected statistics (histogram/no histogram both), but still the same execution plan.

Then looked at metalink for any possible bug (this good practice I learned from my senior). Bingo! found a related issue reported but not the same thing. Let me share with you -

Bug No. 7210921 - STATISTICS WITH SUBPARTITIONS ARE NOT CORRECTLY INTERPRETED IN EXPLAIN PLANS

It explained that there are optimizer problems for partitioned tables - after generating execution plan by the optimizer, the execution plan shows wrong stats for partitions. Deleting table statistics was the workaround mentioned in metalink for the problem!

I thought, as the problem seems similar, what if I delete the partition table statistics?

SQL> EXEC DBMS_STATS.DELETE_TABLE_STATS(ownname =>'PROD', tabname =>'TAB1');

It helped!! The execution plan got changed and showed partitioned index access. I was not convinced to this solution of deleting table stats - specially for such a big table. There are other queries which would access the table and dynamic sampling (absence of statistics) could be another nightmare.

Then thought, what if I use earlier version of the optimizer?

SQL> alter session set optimizer_features_enable='10.2.0.3';


It worked - execution plan changed! The problem is with the 10.2.0.4.0 version.

If we don't want to change system level or session level settings for optimizer version (which may not be good for a large application), there are ways using hints to force particular optimizer version like this - 

SELECT /*+ optimizer_features_enable('10.2.0.3') */ fs.form_type, fs.prov_id, fs.pgm_id, fs.client_id,
fs.created_by_id, fs.created_date, fs.notif_level, fs.state, fs.form_id, fs.title, fs.summary
FROM .......

Then the execution plan looked like follows, this was what I looked for -

------------------------------------------------------------------------------------------------------------

| Id  | Operation                        | Name             | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |

------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |                  |    57 |  8550 |  6160  (43)| 00:00:07 |       |       |
|*  1 |  FILTER                          |                  |       |       |            |          |       |       |
|   2 |   SORT ORDER BY                  |                  |    57 |  8550 |  6160  (43)| 00:00:07 |       |       |
|   3 |    HASH UNIQUE                   |                  |    57 |  8550 |  6159  (43)| 00:00:07 |       |       |
|*  4 |     HASH JOIN                    |                  |    57 |  8550 |  6158  (43)| 00:00:07 |       |       |
|*  5 |      HASH JOIN                   |                  |  2376 |   129K|  4267  (51)| 00:00:05 |       |       |
|*  6 |       TABLE ACCESS BY INDEX ROWID| TAC_LOGIN_ATOK   |   126 |  3150 |   238   (2)| 00:00:01 |       |       |
|*  7 |        INDEX RANGE SCAN          | TAC_LA_LOGIN_ID  |   859 |       |     4   (0)| 00:00:01 |              |
|   8 |       PARTITION RANGE ITERATOR   |                  |  1090K|    32M|  3882  (52)| 00:00:04 |   KEY |   KEY |
|*  9 |        TABLE ACCESS FULL         | TAC_FORM_ATOK    |  1090K|    32M|  3882  (52)| 00:00:04 |   KEY |   KEY |
|  10 |      PARTITION RANGE ITERATOR    |                  | 13852 |  1271K|  1889  (23)| 00:00:02 |   KEY |   KEY |
|* 11 |       TABLE ACCESS FULL          | TAC_FORM_SUMMARY | 13852 |  1271K|  1889  (23)| 00:00:02 |   KEY |   KEY |
-------------------------------------------------------------------------------------------------------------

Perfect! This solves our issue. Probably, I will use the hint solution in production.

Summary of solutions

1. Deleting stats for the partitioned table - not be a good approach, don't like

2. Using earlier optimizer version (10.2.0.3.0) for the database - not very recommended, their might be other fixes in the new version (10.2.0.4.0) even though this particular problem exists.

3. Using query hints to use earlier optimizer (10.2.0.3.0) - seems sensible to me and I planned to fix case by case.

***