Some background to start with –
We have n-tier J2EE web applications and we use spring jdbc templates. We use connection pool, however we don’t use statement caching. We have a statement like this which runs about 10 to 30 times per second at the moment and commits as well.
INSERT INTO lb_stat (id, lb_id, login_id, read_date, lb_ufid, lb_created) VALUES (lb_status_seq.NEXTVAL, :1, :2, :3, :4, :5);
The table where we insert, is a nested partitioned one (range- hash). We have one unique local index on that table. Currently the table contains around 51 million records.
Most of the times, I see some concurrency waits associated with this insert statement when I look at OEM Top Activity. The waits are on “library cache: mutex x”. Sometimes this problem goes very severe and database load spikes and connection pool exhausts. I made sure we did not do anything wrong which allure Oracle to behave like this. I have checked, as we are using bind variables properly, the statement is not producing many child cursors, and not invalidating those child. People have been saying, there might have mutex problems while there are many child cursors – which is not our case, anyways.
We have been using Oracle version 11.2.0.2.0 fresh - no patch yet. My guess is - probably we hit the bug on this - Bug 13524966 or Bug 12966939 or something similar, there are tons of mutex bug reported to Oracle support.
To analyze further, I looked at the web and found some really nice articles on mutex problem. Here is one -
http://andreynikolaev.wordpress.com/2011/05/01/divide-and-conquer-the-true-mutex-contention
To conclude that it's a bug - I exchanged a couple of mails with the writer who is an expert on mutex problem - thanks to Andrey! :-)
It finally revealed that, when there are many calls on that particular sequence, the mutex holds on that and holding time sometimes over 1 ms - which is huge in mutex world. We planned to get rid of the sequence in next release. Also, will be upgrading to 11.2.0.3.0 soon.
Here is some handy scripts I collected to initiate the mutex waits investigation.
/* Library Cache Object heavily accessed */
col name format a20
col cursor format a12 noprint
col type format a7
col LOCKED_TOTAL heading Locked format 99999
col PINNED_TOTAL heading Pinned format 99999999
col EXECUTIONS heading Executed format 99999999
col NAMESPACE heading Nsp format 999
set wrap on
set linesize 200
set pagesize 1000
select * from (
select case when (kglhdadr = kglhdpar) then 'Parent' else 'Child '||kglobt09 end cursor,
kglhdadr ADDRESS,substr(kglnaobj,1,20) name, kglnahsh hash_value,kglobtyd type,kglobt23 LOCKED_TOTAL,kglobt24 PINNED_TOTAL,kglhdexc EXECUTIONS,kglhdnsp NAMESPACE
from x$kglob
order by kglobt24 desc)
where rownum <= 10;
ADDRESS NAME HASH_VALUE TYPE Locked Pinned Executed Nsp
---------------- -------------------- ---------- ------- ------ --------- --------- ----
00000005D1875E38 SELECT 'x' FROM DUAL 4261573303 CURSOR ###### ######### 5232320 0
00000006119FB278 UA_SEQ 1535242682 SEQUENC ###### ######### 0 1
E
000000061190AF30 select login0_.id as 459939420 CURSOR ###### 65289824 5911243 0
000000064F4643F0 MGMT_NOTIFY_Q 1523339315 QUEUE ###### 63487340 249 10
00000005F66D2308 select login0_.id as 459939420 CURSOR ###### 58493935 12382999 0
000000065FACDD90 select /*+ connect_b 487379649 CURSOR ###### 47798467 6782438 0
00000005AE8F3F68 LB_STATUS_SEQ 3672626424 SEQUENC ###### 40551649 0 1
E
/* Mutex ASH Wsits in last 30 min */
col sample_id noprint format 9999999
col sample_time format a8
col session_id heading "SID" format 9999
col session_serial# noprint format 9999
col event format a19
col blocking_sid heading "BLKS" format 9999
col shared_refcount noprint heading "RFC" format 99
col location_id heading "LOC" format 99
col sleeps noprint format 99999
col mutex_object format a15
set pagesize 1000
set wrap off
SELECT sample_id,to_char(sample_time,'hh24:mi:ss') sample_time,session_id,session_serial#,sql_id,event,p1 IDN,
FLOOR (p2/POWER(2,4*ws)) blocking_sid,MOD (p2,POWER(2,4*ws)) shared_refcount,
FLOOR (p3/POWER (2,4*ws)) location_id,MOD (p3,POWER(2,4*ws)) sleeps,
CASE WHEN (event LIKE 'library cache:%' AND p1 <= power(2,17)) THEN 'library cache bucket: '||p1
ELSE (SELECT kglnaobj FROM x$kglob WHERE kglnahsh=p1 AND (kglhdadr = kglhdpar) and rownum=1) END mutex_object
FROM (SELECT DECODE (INSTR (banner, '64'), 0, '4', '8') ws FROM v$version WHERE ROWNUM = 1) wordsize,
v$active_session_history
WHERE p1text='idn' AND session_state='WAITING'
AND sample_time BETWEEN (sysdate - 30/1440) AND sysdate
ORDER BY sample_id;
SAMPLE_T SID SQL_ID EVENT IDN BLKS LOC MUTEX_OBJECT
-------- ----- ------------- ------------------- ---------- ----- --- ---------------
07:26:55 1207 09h36susy1t4n library cache: mute 3672626424 162 0 LB_STATUS_SEQ
07:27:05 1487 09h36susy1t4n library cache: mute 3672626424 1476 0 LB_STATUS_SEQ
07:28:57 1480 library cache: mute 2321070391 658 0 PROD
07:28:57 658 library cache: mute 2321070391 1429 0 PROD
07:28:58 1429 library cache: mute 2924456787 1480 0 DATABASE
07:29:19 637 09h36susy1t4n library cache: mute 3672626424 435 0 LB_STATUS_SEQ
07:32:18 171 09h36susy1t4n library cache: mute 3672626424 373 0 LB_STATUS_SEQ
07:33:20 1426 bg7mbt0mq5j75 library cache: mute 2321070391 645 0 PROD
07:33:58 931 library cache: mute 2321070391 1003 0 PROD
07:33:58 803 library cache: mute 2321070391 931 0 PROD
07:34:26 236 09h36susy1t4n library cache: mute 3672626424 715 0 LB_STATUS_SEQ
07:38:57 1272 library cache: mute 2321070391 914 0 PROD
07:38:57 449 library cache: mute 2321070391 914 0 PROD
07:39:43 517 09h36susy1t4n library cache: mute 3672626424 804 0 LB_STATUS_SEQ
07:39:46 507 09h36susy1t4n library cache: mute 3672626424 1004 0 LB_STATUS_SEQ
07:39:53 665 09h36susy1t4n library cache: mute 3672626424 914 0 LB_STATUS_SEQ
07:40:40 592 library cache: mute 2924456787 720 0 DATABASE
07:41:06 990 09h36susy1t4n library cache: mute 3672626424 221 0 LB_STATUS_SEQ
/* Mutex impact */
select event, p1,cnt,
CASE WHEN (event LIKE 'library cache:%' AND p1 <= power(2,17)) THEN
'library cache bucket: '||p1 ELSE (SELECT kglnaobj FROM x$kglob WHERE
kglnahsh=p1 AND (kglhdadr = kglhdpar) and rownum=1) END mutex_object
from (select p1,event,count(*) cnt from v$active_session_history where
p1text='idn' and session_state='WAITING'
group by p1,event)
order by cnt desc;
EVENT P1 CNT MUTEX_OBJECT
------------------- ---------- ---------- ---------------
library cache: mute 2321070391 286 PROD
library cache: mute 2924456787 269 DATABASE
library cache: mute 3672626424 100 LB_STATUS_SEQ
library cache: mute 1535242682 16 UA_SEQ
library cache: mute 370351507 5 select substrb(
library cache: mute 950094862 4 select min(minb
library cache: mute 57559 3 library cache b
library cache: mute 1455413935 3 select min(minb
library cache: mute 459016664 2 SYS_GROUP
And of course this mutex_stats.sql useful script by Andrey.
SQL> @mutex_stats.sql mutex_identifier=3672626424
Connected.
--------------------------------------------
Statistics for "Library Cache" mutex
idn: 3672626424 address 0x00000005AE8F4098
Interval: 31.8 s, gets: 1471, sleeps:4300
Requests rate: lambda= 46.3 Hz
Sleeps rate: omega= 135.3 Hz
Utilization: rho= .040000
Exclusive Utilization: rho_x= .040000
Shared Utilization: rho_s= .000000
Avg. holding time: S= 864.45 us
Service rate: mu= 1156.8 Hz
Spin inefficiency: k= #########
Secondary statistics:
Slps /Miss: kappa=#########
Spin_gets/miss: sigma=#########
correction coeff. eta= 1.0
Secondary sleeps ratio .745
--------------------------------------------
Avg. number of sessions waiting: .02
Library cache object related to mutex idn 3672626424 :
SEQUENCE: PROD.LB_STATUS_SEQ
ADDR: TYPE PIN/s: LOCK/s: EXEC/s: Namespc
-------------------------------------------------------------------
00000005AE8F3F68 Parent 15 0 0 1
Mutex related parameters:
_kgl_mutex_wait_time=0 _wait_yield_hp_mode=yield
_wait_yield_mode=yield _wait_yield_sleep_freq=100 _wait_yield_sleep_time_msecs=1
_wait_yield_yield_freq=20
PL/SQL procedure successfully completed.
***
We have n-tier J2EE web applications and we use spring jdbc templates. We use connection pool, however we don’t use statement caching. We have a statement like this which runs about 10 to 30 times per second at the moment and commits as well.
INSERT INTO lb_stat (id, lb_id, login_id, read_date, lb_ufid, lb_created) VALUES (lb_status_seq.NEXTVAL, :1, :2, :3, :4, :5);
The table where we insert, is a nested partitioned one (range- hash). We have one unique local index on that table. Currently the table contains around 51 million records.
Most of the times, I see some concurrency waits associated with this insert statement when I look at OEM Top Activity. The waits are on “library cache: mutex x”. Sometimes this problem goes very severe and database load spikes and connection pool exhausts. I made sure we did not do anything wrong which allure Oracle to behave like this. I have checked, as we are using bind variables properly, the statement is not producing many child cursors, and not invalidating those child. People have been saying, there might have mutex problems while there are many child cursors – which is not our case, anyways.
We have been using Oracle version 11.2.0.2.0 fresh - no patch yet. My guess is - probably we hit the bug on this - Bug 13524966 or Bug 12966939 or something similar, there are tons of mutex bug reported to Oracle support.
To analyze further, I looked at the web and found some really nice articles on mutex problem. Here is one -
http://andreynikolaev.wordpress.com/2011/05/01/divide-and-conquer-the-true-mutex-contention
To conclude that it's a bug - I exchanged a couple of mails with the writer who is an expert on mutex problem - thanks to Andrey! :-)
It finally revealed that, when there are many calls on that particular sequence, the mutex holds on that and holding time sometimes over 1 ms - which is huge in mutex world. We planned to get rid of the sequence in next release. Also, will be upgrading to 11.2.0.3.0 soon.
Here is some handy scripts I collected to initiate the mutex waits investigation.
/* Library Cache Object heavily accessed */
col name format a20
col cursor format a12 noprint
col type format a7
col LOCKED_TOTAL heading Locked format 99999
col PINNED_TOTAL heading Pinned format 99999999
col EXECUTIONS heading Executed format 99999999
col NAMESPACE heading Nsp format 999
set wrap on
set linesize 200
set pagesize 1000
select * from (
select case when (kglhdadr = kglhdpar) then 'Parent' else 'Child '||kglobt09 end cursor,
kglhdadr ADDRESS,substr(kglnaobj,1,20) name, kglnahsh hash_value,kglobtyd type,kglobt23 LOCKED_TOTAL,kglobt24 PINNED_TOTAL,kglhdexc EXECUTIONS,kglhdnsp NAMESPACE
from x$kglob
order by kglobt24 desc)
where rownum <= 10;
ADDRESS NAME HASH_VALUE TYPE Locked Pinned Executed Nsp
---------------- -------------------- ---------- ------- ------ --------- --------- ----
00000005D1875E38 SELECT 'x' FROM DUAL 4261573303 CURSOR ###### ######### 5232320 0
00000006119FB278 UA_SEQ 1535242682 SEQUENC ###### ######### 0 1
E
000000061190AF30 select login0_.id as 459939420 CURSOR ###### 65289824 5911243 0
000000064F4643F0 MGMT_NOTIFY_Q 1523339315 QUEUE ###### 63487340 249 10
00000005F66D2308 select login0_.id as 459939420 CURSOR ###### 58493935 12382999 0
000000065FACDD90 select /*+ connect_b 487379649 CURSOR ###### 47798467 6782438 0
00000005AE8F3F68 LB_STATUS_SEQ 3672626424 SEQUENC ###### 40551649 0 1
E
/* Mutex ASH Wsits in last 30 min */
col sample_id noprint format 9999999
col sample_time format a8
col session_id heading "SID" format 9999
col session_serial# noprint format 9999
col event format a19
col blocking_sid heading "BLKS" format 9999
col shared_refcount noprint heading "RFC" format 99
col location_id heading "LOC" format 99
col sleeps noprint format 99999
col mutex_object format a15
set pagesize 1000
set wrap off
SELECT sample_id,to_char(sample_time,'hh24:mi:ss') sample_time,session_id,session_serial#,sql_id,event,p1 IDN,
FLOOR (p2/POWER(2,4*ws)) blocking_sid,MOD (p2,POWER(2,4*ws)) shared_refcount,
FLOOR (p3/POWER (2,4*ws)) location_id,MOD (p3,POWER(2,4*ws)) sleeps,
CASE WHEN (event LIKE 'library cache:%' AND p1 <= power(2,17)) THEN 'library cache bucket: '||p1
ELSE (SELECT kglnaobj FROM x$kglob WHERE kglnahsh=p1 AND (kglhdadr = kglhdpar) and rownum=1) END mutex_object
FROM (SELECT DECODE (INSTR (banner, '64'), 0, '4', '8') ws FROM v$version WHERE ROWNUM = 1) wordsize,
v$active_session_history
WHERE p1text='idn' AND session_state='WAITING'
AND sample_time BETWEEN (sysdate - 30/1440) AND sysdate
ORDER BY sample_id;
SAMPLE_T SID SQL_ID EVENT IDN BLKS LOC MUTEX_OBJECT
-------- ----- ------------- ------------------- ---------- ----- --- ---------------
07:26:55 1207 09h36susy1t4n library cache: mute 3672626424 162 0 LB_STATUS_SEQ
07:27:05 1487 09h36susy1t4n library cache: mute 3672626424 1476 0 LB_STATUS_SEQ
07:28:57 1480 library cache: mute 2321070391 658 0 PROD
07:28:57 658 library cache: mute 2321070391 1429 0 PROD
07:28:58 1429 library cache: mute 2924456787 1480 0 DATABASE
07:29:19 637 09h36susy1t4n library cache: mute 3672626424 435 0 LB_STATUS_SEQ
07:32:18 171 09h36susy1t4n library cache: mute 3672626424 373 0 LB_STATUS_SEQ
07:33:20 1426 bg7mbt0mq5j75 library cache: mute 2321070391 645 0 PROD
07:33:58 931 library cache: mute 2321070391 1003 0 PROD
07:33:58 803 library cache: mute 2321070391 931 0 PROD
07:34:26 236 09h36susy1t4n library cache: mute 3672626424 715 0 LB_STATUS_SEQ
07:38:57 1272 library cache: mute 2321070391 914 0 PROD
07:38:57 449 library cache: mute 2321070391 914 0 PROD
07:39:43 517 09h36susy1t4n library cache: mute 3672626424 804 0 LB_STATUS_SEQ
07:39:46 507 09h36susy1t4n library cache: mute 3672626424 1004 0 LB_STATUS_SEQ
07:39:53 665 09h36susy1t4n library cache: mute 3672626424 914 0 LB_STATUS_SEQ
07:40:40 592 library cache: mute 2924456787 720 0 DATABASE
07:41:06 990 09h36susy1t4n library cache: mute 3672626424 221 0 LB_STATUS_SEQ
/* Mutex impact */
select event, p1,cnt,
CASE WHEN (event LIKE 'library cache:%' AND p1 <= power(2,17)) THEN
'library cache bucket: '||p1 ELSE (SELECT kglnaobj FROM x$kglob WHERE
kglnahsh=p1 AND (kglhdadr = kglhdpar) and rownum=1) END mutex_object
from (select p1,event,count(*) cnt from v$active_session_history where
p1text='idn' and session_state='WAITING'
group by p1,event)
order by cnt desc;
EVENT P1 CNT MUTEX_OBJECT
------------------- ---------- ---------- ---------------
library cache: mute 2321070391 286 PROD
library cache: mute 2924456787 269 DATABASE
library cache: mute 3672626424 100 LB_STATUS_SEQ
library cache: mute 1535242682 16 UA_SEQ
library cache: mute 370351507 5 select substrb(
library cache: mute 950094862 4 select min(minb
library cache: mute 57559 3 library cache b
library cache: mute 1455413935 3 select min(minb
library cache: mute 459016664 2 SYS_GROUP
And of course this mutex_stats.sql useful script by Andrey.
SQL> @mutex_stats.sql mutex_identifier=3672626424
Connected.
--------------------------------------------
Statistics for "Library Cache" mutex
idn: 3672626424 address 0x00000005AE8F4098
Interval: 31.8 s, gets: 1471, sleeps:4300
Requests rate: lambda= 46.3 Hz
Sleeps rate: omega= 135.3 Hz
Utilization: rho= .040000
Exclusive Utilization: rho_x= .040000
Shared Utilization: rho_s= .000000
Avg. holding time: S= 864.45 us
Service rate: mu= 1156.8 Hz
Spin inefficiency: k= #########
Secondary statistics:
Slps /Miss: kappa=#########
Spin_gets/miss: sigma=#########
correction coeff. eta= 1.0
Secondary sleeps ratio .745
--------------------------------------------
Avg. number of sessions waiting: .02
Library cache object related to mutex idn 3672626424 :
SEQUENCE: PROD.LB_STATUS_SEQ
ADDR: TYPE PIN/s: LOCK/s: EXEC/s: Namespc
-------------------------------------------------------------------
00000005AE8F3F68 Parent 15 0 0 1
Mutex related parameters:
_kgl_mutex_wait_time=0 _wait_yield_hp_mode=yield
_wait_yield_mode=yield _wait_yield_sleep_freq=100 _wait_yield_sleep_time_msecs=1
_wait_yield_yield_freq=20
PL/SQL procedure successfully completed.
***
No comments:
Post a Comment