Showing posts with label database load. Show all posts
Showing posts with label database load. Show all posts

Sep 10, 2009

Madness for Stale Statistics

Yesterday evening (my time zone), which is Wednesday morning for the application users on a different time zone, database was behaving strangely - literally crawling! As it was not my office hour, got an unofficial phone call from the monitoring guy. I gave a quick look at email threads and found that others were on the issue and they were looking and discussing things. Seeing that, I offloaded myself from the issue and went to bed - it was late at night on the month of holy "Ramadan". I hoped that we would survive and I would look at things on my next office hour.

Because of Murphy's law probably - I got phone call from my tech boss at around 2:20 AM :-)


Things were running slower than the other days. For last few days, database was consuming about 50% CPU on avg. But today, it was 70-80% and overall response time had been increased. But we did not do any changes in database recently.

I looked at AWR, ASH reports - ran my custom queries... everything were very confusing. Queries were taking longer time than previous. I was looking here and there... in the middle of the night, brain was functioning slowly.

Finally, after 15-20 min, to verify if we had statistics correct, I ran the following script -

SET SERVEROUT ON;


DECLARE
ObjList DBMS_STATS.ObjectTab;
BEGIN
DBMS_STATS.GATHER_SCHEMA_STATS(ownname=>'PROD', objlist=>ObjList, options=>'LIST STALE');
FOR i IN ObjList.FIRST..ObjList.LAST
LOOP
DBMS_OUTPUT.PUT_LINE(ObjList(i).ObjName || ' - ' || ObjList(i).ObjType || ' ' || ObjList(i).partname);
END LOOP;
END;
/

What I saw, was not very expected, there were about 75 tables with outdated statistics!

I started collecting statistics table by table - the important ones first. Server load gradually decreased. Quickly, I collected stats with no histogram and 50% sample size - for us, big sample size works better.

SQL> exec dbms_stats.gather_table_stats(ownname=>'PROD',tabname=>'USER', cascade=>true, degree=>2, estimate_percent=>50, method_opt=>'FOR  ALL COLUMNS SIZE 1');


It took about 2.5 hours to collect stats for about 150 tables. Another sleepless night!


What happened?

Statistics had been collected with the default dbms scheduler jobs with 1 day interval by weekday maintenance window. The sample size was also low - 5% only. Those were not very appropriate for us in these days. Tables were becoming stale quite frequently.