Oct 5, 2009

Optimizer glitch for partitioned table in version 10.2.0.4.0

Recently discovered that, partitioned tables are being accessed through global index rather than partitioned index, even though there are enough reasons to access via local/partitioned index.

Initially thought, as we turned off bind variable peek for execution plan stability (hidden system parameter _optim_peek_user_binds to FALSE), optimizer is not looking at the bind values and doesn't know which partition to access. So, accessing through a global index seemed logical. But there were other reasons!

At some point, we will need to access partitions through local index which is more efficient than accessing through global index. So I did more experiments on these. In my testing, I used literals instead of bind variables - but strangely, the execution plan did not change according to my expectation and was showing old plan with global index access!

The plan looked like -

------------------------------------------------------------------------------------------------------------

| Id  | Operation                               | Name             | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |

------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                  |     5 |   750 |  1748   (2)| 00:00:02 |       |       |
|*  1 |  FILTER                                 |                  |       |       |            |          |       |       |
|   2 |   SORT ORDER BY                         |                  |     5 |   750 |  1748   (2)| 00:00:02 |       |       |
|   3 |    HASH UNIQUE                          |                  |     5 |   750 |  1747   (1)| 00:00:02 |       |       |
|   4 |     NESTED LOOPS                        |                  |     5 |   750 |  1746   (1)| 00:00:02 |       |       |
|   5 |      NESTED LOOPS                       |                  |   204 | 11424 |  1333   (1)| 00:00:02 |       |       |
|*  6 |       TABLE ACCESS BY INDEX ROWID       | TAC_LOGIN_ATOK   |    11 |   275 |    23   (0)| 00:00:01 |       |       |
|*  7 |        INDEX RANGE SCAN                 | TAC_LA_LOGIN_ID  |    74 |       |     3   (0)| 00:00:01 |      |       |
|*  8 |       TABLE ACCESS BY GLOBAL INDEX ROWID| TAC_FORM_ATOK    |    19 |   589 |   353   (1)| 00:00:01 | ROWID | ROWID |
|*  9 |        INDEX RANGE SCAN                 | TAC_FA_ATOK_I    |    38 |       |     2   (0)| 00:00:01 |       |       |
|* 10 |      TABLE ACCESS BY GLOBAL INDEX ROWID | TAC_FORM_SUMMARY |     1 |    94 |     2   (0)| 00:00:01 | ROWID | ROWID |
|* 11 |       INDEX UNIQUE SCAN                 | TAC_FS_UFID_PK   |     1 |       |     1   (0)| 00:00:01 |       |       |
-------------------------------------------------------------------------------------------------------------

Something must be wrong! Thought, it could be statistics or could be some other parameters.Rebuilt local index and recollected statistics (histogram/no histogram both), but still the same execution plan.

Then looked at metalink for any possible bug (this good practice I learned from my senior). Bingo! found a related issue reported but not the same thing. Let me share with you -

Bug No. 7210921 - STATISTICS WITH SUBPARTITIONS ARE NOT CORRECTLY INTERPRETED IN EXPLAIN PLANS

It explained that there are optimizer problems for partitioned tables - after generating execution plan by the optimizer, the execution plan shows wrong stats for partitions. Deleting table statistics was the workaround mentioned in metalink for the problem!

I thought, as the problem seems similar, what if I delete the partition table statistics?

SQL> EXEC DBMS_STATS.DELETE_TABLE_STATS(ownname =>'PROD', tabname =>'TAB1');

It helped!! The execution plan got changed and showed partitioned index access. I was not convinced to this solution of deleting table stats - specially for such a big table. There are other queries which would access the table and dynamic sampling (absence of statistics) could be another nightmare.

Then thought, what if I use earlier version of the optimizer?

SQL> alter session set optimizer_features_enable='10.2.0.3';


It worked - execution plan changed! The problem is with the 10.2.0.4.0 version.

If we don't want to change system level or session level settings for optimizer version (which may not be good for a large application), there are ways using hints to force particular optimizer version like this - 

SELECT /*+ optimizer_features_enable('10.2.0.3') */ fs.form_type, fs.prov_id, fs.pgm_id, fs.client_id,
fs.created_by_id, fs.created_date, fs.notif_level, fs.state, fs.form_id, fs.title, fs.summary
FROM .......

Then the execution plan looked like follows, this was what I looked for -

------------------------------------------------------------------------------------------------------------

| Id  | Operation                        | Name             | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |

------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |                  |    57 |  8550 |  6160  (43)| 00:00:07 |       |       |
|*  1 |  FILTER                          |                  |       |       |            |          |       |       |
|   2 |   SORT ORDER BY                  |                  |    57 |  8550 |  6160  (43)| 00:00:07 |       |       |
|   3 |    HASH UNIQUE                   |                  |    57 |  8550 |  6159  (43)| 00:00:07 |       |       |
|*  4 |     HASH JOIN                    |                  |    57 |  8550 |  6158  (43)| 00:00:07 |       |       |
|*  5 |      HASH JOIN                   |                  |  2376 |   129K|  4267  (51)| 00:00:05 |       |       |
|*  6 |       TABLE ACCESS BY INDEX ROWID| TAC_LOGIN_ATOK   |   126 |  3150 |   238   (2)| 00:00:01 |       |       |
|*  7 |        INDEX RANGE SCAN          | TAC_LA_LOGIN_ID  |   859 |       |     4   (0)| 00:00:01 |              |
|   8 |       PARTITION RANGE ITERATOR   |                  |  1090K|    32M|  3882  (52)| 00:00:04 |   KEY |   KEY |
|*  9 |        TABLE ACCESS FULL         | TAC_FORM_ATOK    |  1090K|    32M|  3882  (52)| 00:00:04 |   KEY |   KEY |
|  10 |      PARTITION RANGE ITERATOR    |                  | 13852 |  1271K|  1889  (23)| 00:00:02 |   KEY |   KEY |
|* 11 |       TABLE ACCESS FULL          | TAC_FORM_SUMMARY | 13852 |  1271K|  1889  (23)| 00:00:02 |   KEY |   KEY |
-------------------------------------------------------------------------------------------------------------

Perfect! This solves our issue. Probably, I will use the hint solution in production.

Summary of solutions

1. Deleting stats for the partitioned table - not be a good approach, don't like

2. Using earlier optimizer version (10.2.0.3.0) for the database - not very recommended, their might be other fixes in the new version (10.2.0.4.0) even though this particular problem exists.

3. Using query hints to use earlier optimizer (10.2.0.3.0) - seems sensible to me and I planned to fix case by case.

***

No comments: