There are many arguments we can pass in dbms_stats.gather_table_stats procedure - "method_opt" and "estimate_percent" are two very important ones.
The default values (in 10g) for "method_opt" and "estimate_percent" are AUTO - meaning that Oracle will decide if Histogram Stats or No Histogram is appropriate for columns.
Sometimes, we customizes sample size to make stats collection job faster. This customization should be perfect otherwise a low sample size could result in non-representative statistics.
For example, when I collect statistics with 5% sample size, it does not create histograms for the two following columns.
SQL> exec DBMS_STATS.GATHER_TABLE_STATS(ownname=>'TEST', tabname=>'TAB1', degree=>2, estimate_percent=>5);
SQL> SELECT column_name, num_buckets, num_distinct, sample_size, histogram
FROM user_tab_col_statistics
WHERE table_name='TAB1';
COLUMN_NAME NUM_BUCKETS NUM_DISTINCT SAMPLE_SIZE HISTOGRAM
------------------------------ ----------- ------------ ----------- ---------------
UPDATED 1 6382137 633875 NONE
CREATED 1 7858623 633875 NONE
VERSION 15 15 633875 FREQUENCY
ID 1 12677500 633875 NONE
TRASH_DATE 1 294401 221149 NONE
Now, when I increased the sample size to 50%, the procedure collected Height Balanced Histograms for the two columns. See below -
SQL> exec DBMS_STATS.GATHER_TABLE_STATS(ownname=>'TEST', tabname=>'TAB1', degree=>2, estimate_percent=>50);
SQL> SELECT column_name, num_buckets, num_distinct, sample_size, histogram
FROM user_tab_col_statistics
WHERE table_name='TAB1';
COLUMN_NAME NUM_BUCKETS NUM_DISTINCT SAMPLE_SIZE HISTOGRAM
------------------------------ ----------- ------------ ----------- ---------------
UPDATED 254 8929376 6347812 HEIGHT BALANCED
CREATED 254 9793629 6345628 HEIGHT BALANCED
VERSION 21 21 6341908 FREQUENCY
ID 1 12692852 6346426 NONE
TRASH_DATE 1 1217543 2208858 NONE
The different statistics could affect the execution plan for the queries where CREATED/UPDATED columns are used. So, these things should be carefully tested and sample size should not be underestimated. In fact, in my experience I have seen, larger sample size works better for big tables.
P.S. I planned to write a post on this probably in January this year, when I did the experiments on database statistics with histogram and no histogram options. Later became busy with other stuffs and forgot. Few days ago, I worked again to write custom scripts for collecting database stats. The sample size problem stroked - I set the sample size 50% in my scripts - working perfectly!
***
No comments:
Post a Comment