On Monday morning, the first working day after the cutover from primary site to secondary site, we were experiencing unexpected problems on database server which was sitting idle for long time.
The load avg went high over 10. We are running on 8 core cpu box, this amount of load is totally unexpected, our regular load avg is 2-3 on a busy day.
I looked at the top queries and top events. There were waits associated with every queries. Queries were waiting for some scheduler activities... and the top queries are not the "actual" top ones I knew - the top list were being changed frequently based on users' current activities. The db server was crawling!
I found that the "resmgr:cpu quantum" was listed as top wait events and after running ADDM, I was confirmed seeing this - "Wait class Scheduler was consuming significant database time". It had become almost clear that some resource management settings are wrong/different on this database server.
ADDM analysis revealed - "Wait class Scheduler was consuming significant database time".
Now it's time to check those parameters. Yes!! I guessed it correct - two servers resource management parameters were not identical! Shocking!! The maintenance window was still running which activated the resource manager plan - this should not be the correct settings.
On the well performed server (where the application was running before the cutover), the values were -
SQL> select name, value
from v$parameter
where name in ('resource_manager_plan', 'resource_limit');
NAME VALUE
-------------------------------------- ----------
resource_limit TRUE
resource_manager_plan
And on the "problem server", the settings were like -
SQL> select name, value
from v$parameter
where name in ('resource_manager_plan', 'resource_limit');
NAME VALUE
-------------------------------------- ----------
resource_limit FALSE
resource_manager_plan SCHEDULER[0x22F0]:SYSTEM_PLAN
Ah - completely different, resource manager got activated here by setting the value for "resource_manager_plan" by the maintenance window which had been configured to run long hours. So, identified the root cause, after changing the value (no manager plan), all those scheduler wait events disappeared and database was behaving according to expectation - load average dropped!
SQL> alter system set resource_manager_plan='' scope=both;
System altered.
This SYSTEM_PLAN was not appropriate for the application schema. The maintenance window should have been finished by the time and we should not run on this resource manager plan after 8 AM in the morning in any circumstances.
Let's take a look what those parameter are all about -
RESOURCE_LIMIT: This parameter must be set to true to enforce resource limits assigned to a user through profiles.
RESOURCE_MANAGER_PLAN: Setting this parameter activates the resource manager. The resource manager is enabled by setting the resource_manager_plan initialization parameter to an existing plan. Note: When enabled, the DBMS Scheduler can automatically change the Resource Manager plan at Scheduler window boundaries. So keeping the window only open for a specified period of time (idle time) is a good thing.
Luck! always get volunteered to investigate such critical problems outside office hours (and very late at night for timezone difference) which is not appreciated by my family members always. I knew that these things were handled by the other database guy. Communication had always been a big issue for companies and being a global company, we are not exceptional!
The load avg went high over 10. We are running on 8 core cpu box, this amount of load is totally unexpected, our regular load avg is 2-3 on a busy day.
I looked at the top queries and top events. There were waits associated with every queries. Queries were waiting for some scheduler activities... and the top queries are not the "actual" top ones I knew - the top list were being changed frequently based on users' current activities. The db server was crawling!
I found that the "resmgr:cpu quantum" was listed as top wait events and after running ADDM, I was confirmed seeing this - "Wait class Scheduler was consuming significant database time". It had become almost clear that some resource management settings are wrong/different on this database server.
Top User Events
Event | Event Class | % Activity | Avg Active Sessions |
---|---|---|---|
resmgr:cpu quantum | Scheduler | 92.26 | 149.47 |
CPU + Wait for CPU | CPU | 6.63 | 10.73 |
ADDM analysis revealed - "Wait class Scheduler was consuming significant database time".
Now it's time to check those parameters. Yes!! I guessed it correct - two servers resource management parameters were not identical! Shocking!! The maintenance window was still running which activated the resource manager plan - this should not be the correct settings.
On the well performed server (where the application was running before the cutover), the values were -
SQL> select name, value
from v$parameter
where name in ('resource_manager_plan', 'resource_limit');
-------------------------------------- ----------
resource_limit
resource_manager_plan
And on the "problem server", the settings were like -
SQL> select name, value
from v$parameter
where name in ('resource_manager_plan', 'resource_limit');
-------------------------------------- ----------
resource_limit
resource_manager_plan
Ah - completely different, resource manager got activated here by setting the value for "resource_manager_plan" by the maintenance window which had been configured to run long hours. So, identified the root cause, after changing the value (no manager plan), all those scheduler wait events disappeared and database was behaving according to expectation - load average dropped!
SQL> alter system set resource_manager_plan='' scope=both;
System altered.
This SYSTEM_PLAN was not appropriate for the application schema. The maintenance window should have been finished by the time and we should not run on this resource manager plan after 8 AM in the morning in any circumstances.
Let's take a look what those parameter are all about -
RESOURCE_LIMIT: This parameter must be set to true to enforce resource limits assigned to a user through profiles.
RESOURCE_MANAGER_PLAN: Setting this parameter activates the resource manager. The resource manager is enabled by setting the resource_manager_plan initialization parameter to an existing plan. Note: When enabled, the DBMS Scheduler can automatically change the Resource Manager plan at Scheduler window boundaries. So keeping the window only open for a specified period of time (idle time) is a good thing.
Luck! always get volunteered to investigate such critical problems outside office hours (and very late at night for timezone difference) which is not appreciated by my family members always. I knew that these things were handled by the other database guy. Communication had always been a big issue for companies and being a global company, we are not exceptional!
1 comment:
I had the same problem. Thank you.
Post a Comment