Mar 22, 2014

Database query runs slow in second execution - why?

When a query returns result quickly in the first run but the same query runs slow in second execution or successive executions, it is probably for the cardinality feedback the query used for the other executions. We had a scenario recently.

The first execution returns in less than a second - 78 records. When the query goes for successive executions, it takes over 5 minutes to return the same result. This slow execution of the query slowed down one of the application servers and caused huge number of thread stuck and bog down the server at the end. I thought, this happened for the wrong cardinality feedback from the first execution of the query.

To become very sure about this, I did the following tests - 

SQL> alter session set statistics_level=all;
SQL> set autotrace traceon;

SQL> SELECT l.id, l.login_name, l.first_name, l.last_name, l.title_id, l.title 
FROM table_x l, view_y pr 
WHERE l.id = pr.login_id
AND pr.pgm_id IN (68894) 
AND l.enroll = 0 
AND l.status = 1 
ORDER BY l.last_name, l.first_name, l.login_name, l.title;

Elapsed: 00:00:00.78

Statistics
----------------------------------------------------------
          8  recursive calls
          3  db block gets
      16074  consistent gets
        629  physical reads
          0  redo size
       5783  bytes sent via SQL*Net to client
        575  bytes received via SQL*Net from client
          7  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
         79  rows processed

SQL> /

Elapsed: 00:05:49.29

Statistics
----------------------------------------------------------
       4069  recursive calls
          8  db block gets
      47259  consistent gets
     518490  physical reads
        272  redo size
       5783  bytes sent via SQL*Net to client
        575  bytes received via SQL*Net from client
          7  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
         79  rows processed

Note
-----
   - cardinality feedback used for this statement


To get rid of the cardinality feedback for this test case, I used hint with the query to tell the optimizer not to use the feedback. FYI - cardinality feedback could be turned off in session level and instance level by using ALTER SESSION SET "_OPTIMIZER_USE_FEEDBACK"=FALSE;


SQL> SELECT /*+ OPT_PARAM('_OPTIMIZER_USE_FEEDBACK' 'FALSE') */ l.id, l.login_name, l.first_name, l.last_name, l.title_id, l.title 
FROM table_x l, view_y pr 
WHERE l.id = pr.login_id
AND pr.pgm_id IN (68894) 
AND l.enroll = 0 
AND l.status = 1 
ORDER BY l.last_name, l.first_name, l.login_name, l.title;


Elapsed: 00:00:00.65

Statistics
----------------------------------------------------------
          0  recursive calls
          3  db block gets
      16069  consistent gets
          0  physical reads
         64  redo size
       5783  bytes sent via SQL*Net to client
        575  bytes received via SQL*Net from client
          7  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
         79  rows processed


A word about cardinality feedback

Cardinality feedback is introduced in Oracle 11g to make the optimizer working better. To describe the feature in short - when a query is being executed repetitively, it sends cardinality feedback if the estimated and actual rows have been revealed different in earlier executions. Anyways, we had a scenario related to cardinality feedback. We had complex SQLs which had been running in 1 to 2 seconds for the first run then suddenly it started taking several minutes!

Why cardinality feedback was being used?

* We have very complex underlying predicates in the view - that's could be one reason for not estimating correct cardinality in the execution phase. For the complex view, when many rows returns in intermediate steps in the executions, that makes the cardinality estimation even worse in our case. 

* Stale statistics could be another reason for wrong cardinality estimation. But that was not in our case, I checked that, there was no stale statistics for the tables used in the query.


So, for the pr.pgm_id 68894, the first execution luckily ends up with best execution plan. Oracle optimizer is intelligent - it reports a mismatch in cardinality estimate in the first run and gives feedback for successive runs. The cardinality feedback drove the query to go for full table scans on the next run on several tables for the underlying complex view "view_y". For the pr.pgm_id 68894, the query had to process about 10 GB of data (detailed could be found in SQL Monitoring in OEM) when it went for full table scans. As it had to process that amount of data, it did not use regular buffer cache, instead it used available memory in PGA and temporary tablespace to hold and process those huge amount of data. That aggravated the situation slowing down for doing excessive physical reads. 

Now we'll fix this in the application code - we don't want to turn the cardinality feedback feature off at instance level for such discrete instance for a single query. 


***

No comments: