Jul 9, 2012

Concurrency waits "library cache: mutex x" for sequence

Some background to start with –

We have n-tier J2EE web applications and we use spring jdbc templates. We use connection pool, however we don’t use statement caching. We have a statement like this which runs about 10 to 30 times per second at the moment and commits as well.

INSERT INTO lb_stat (id, lb_id, login_id, read_date, lb_ufid, lb_created) VALUES (lb_status_seq.NEXTVAL, :1, :2, :3, :4, :5);

The table where we insert, is a nested partitioned one (range- hash). We have one unique local index on that table. Currently the table contains around 51 million records.


Most of the times, I see some concurrency waits associated with this insert statement when I look at OEM Top Activity. The waits are on “library cache: mutex x”. Sometimes this problem goes very severe and database load spikes and connection pool exhausts. I made sure we did not do anything wrong which allure Oracle to behave like this. I have checked, as we are using bind variables properly, the statement is not producing many child cursors, and not invalidating those child. People have been saying, there might have mutex problems while there are many child cursors – which is not our case, anyways.

We have been using Oracle version 11.2.0.2.0 fresh - no patch yet. My guess is - probably we hit the bug on this - Bug 13524966 or Bug 12966939 or something similar, there are tons of mutex bug reported to Oracle support.

To analyze further, I looked at the web and found some really nice articles on mutex problem. Here is one -

http://andreynikolaev.wordpress.com/2011/05/01/divide-and-conquer-the-true-mutex-contention

To conclude that it's a bug - I exchanged a couple of mails with the writer who is an expert on mutex problem - thanks to Andrey! :-)

It finally revealed that, when there are many calls on that particular  sequence, the mutex holds on that and holding time sometimes over 1 ms - which is huge in mutex world. We planned to get rid of the sequence in next release. Also, will be upgrading to 11.2.0.3.0 soon.


Here is some handy scripts I collected to initiate the mutex waits investigation.

/* Library Cache Object heavily accessed */

col name format a20
col cursor format a12 noprint
col type format a7
col LOCKED_TOTAL heading Locked format 99999
col PINNED_TOTAL heading Pinned format 99999999
col EXECUTIONS heading Executed format 99999999
col NAMESPACE heading Nsp format 999
set wrap on
set linesize 200

set pagesize 1000
 

select * from (
select case when (kglhdadr =  kglhdpar) then 'Parent' else 'Child '||kglobt09 end cursor,
              kglhdadr ADDRESS,substr(kglnaobj,1,20) name, kglnahsh hash_value,kglobtyd type,kglobt23 LOCKED_TOTAL,kglobt24 PINNED_TOTAL,kglhdexc EXECUTIONS,kglhdnsp NAMESPACE
                from x$kglob
                order by kglobt24 desc)
    where rownum <= 10;


ADDRESS          NAME                 HASH_VALUE TYPE    Locked    Pinned  Executed  Nsp                                                                                                               
---------------- -------------------- ---------- ------- ------ --------- --------- ----                                                                                                               
00000005D1875E38 SELECT 'x' FROM DUAL 4261573303 CURSOR  ###### #########   5232320    0                                                                                                               
00000006119FB278 UA_SEQ               1535242682 SEQUENC ###### #########         0    1                                                                                                               
                                                 E                                                                                                                                                     
                                                                                                                                                                                                       
000000061190AF30 select login0_.id as  459939420 CURSOR  ######  65289824   5911243    0                                                                                                               
000000064F4643F0 MGMT_NOTIFY_Q        1523339315 QUEUE   ######  63487340       249   10                                                                                                               
00000005F66D2308 select login0_.id as  459939420 CURSOR  ######  58493935  12382999    0                                                                                                               
000000065FACDD90 select /*+ connect_b  487379649 CURSOR  ######  47798467   6782438    0                                                                                                               
00000005AE8F3F68 LB_STATUS_SEQ        3672626424 SEQUENC ######  40551649         0    1                                                                                                               
                                                 E                                                            
                                                            
/* Mutex ASH Wsits in last 30 min */

col sample_id noprint format 9999999
col sample_time format a8
col session_id heading "SID" format 9999
col session_serial# noprint format 9999
col event format a19
col blocking_sid heading "BLKS" format 9999
col shared_refcount noprint heading "RFC" format 99
col location_id heading "LOC" format 99
col sleeps  noprint format 99999
col mutex_object format a15
set pagesize 1000

set wrap off
 

SELECT sample_id,to_char(sample_time,'hh24:mi:ss') sample_time,session_id,session_serial#,sql_id,event,p1 IDN,
             FLOOR (p2/POWER(2,4*ws)) blocking_sid,MOD (p2,POWER(2,4*ws)) shared_refcount,
             FLOOR (p3/POWER (2,4*ws)) location_id,MOD (p3,POWER(2,4*ws)) sleeps,
             CASE WHEN (event LIKE 'library cache:%' AND p1 <= power(2,17)) THEN  'library cache bucket: '||p1
                 ELSE  (SELECT kglnaobj FROM x$kglob WHERE kglnahsh=p1 AND (kglhdadr = kglhdpar) and rownum=1) END mutex_object
            FROM (SELECT DECODE (INSTR (banner, '64'), 0, '4', '8') ws FROM v$version WHERE ROWNUM = 1) wordsize,
                   v$active_session_history
            WHERE p1text='idn' AND session_state='WAITING'
            AND sample_time BETWEEN (sysdate - 30/1440) AND sysdate
           ORDER BY sample_id;



SAMPLE_T   SID SQL_ID        EVENT                      IDN  BLKS LOC MUTEX_OBJECT                                                                                                                     
-------- ----- ------------- ------------------- ---------- ----- --- ---------------                  

07:26:55  1207 09h36susy1t4n library cache: mute 3672626424   162   0 LB_STATUS_SEQ                                                                                                                    
07:27:05  1487 09h36susy1t4n library cache: mute 3672626424  1476   0 LB_STATUS_SEQ                                                                                                                    
07:28:57  1480               library cache: mute 2321070391   658   0 PROD                                                                                                                           
07:28:57   658               library cache: mute 2321070391  1429   0 PROD                                                                                                                            
07:28:58  1429               library cache: mute 2924456787  1480   0 DATABASE                                                                                                                         
07:29:19   637 09h36susy1t4n library cache: mute 3672626424   435   0 LB_STATUS_SEQ                                                                                                                    
07:32:18   171 09h36susy1t4n library cache: mute 3672626424   373   0 LB_STATUS_SEQ                                                                                                                    
07:33:20  1426 bg7mbt0mq5j75 library cache: mute 2321070391   645   0 PROD                                                                                                                            
07:33:58   931               library cache: mute 2321070391  1003   0 PROD                                                                                                                            
07:33:58   803               library cache: mute 2321070391   931   0 PROD                                                                                                                            
07:34:26   236 09h36susy1t4n library cache: mute 3672626424   715   0 LB_STATUS_SEQ                                                                                                                    
07:38:57  1272               library cache: mute 2321070391   914   0 PROD                                                                                                                            
07:38:57   449               library cache: mute 2321070391   914   0 PROD                                                                                                                            
07:39:43   517 09h36susy1t4n library cache: mute 3672626424   804   0 LB_STATUS_SEQ                                                                                                                    
07:39:46   507 09h36susy1t4n library cache: mute 3672626424  1004   0 LB_STATUS_SEQ                                                                                                                    
07:39:53   665 09h36susy1t4n library cache: mute 3672626424   914   0 LB_STATUS_SEQ                                                                                                                    
07:40:40   592               library cache: mute 2924456787   720   0 DATABASE                                                                                                                         
07:41:06   990 09h36susy1t4n library cache: mute 3672626424   221   0 LB_STATUS_SEQ      
 

                                                                          
/* Mutex impact */

select event, p1,cnt,
    CASE WHEN (event LIKE 'library cache:%' AND p1 <= power(2,17)) THEN
    'library cache bucket: '||p1   ELSE  (SELECT kglnaobj FROM x$kglob WHERE
    kglnahsh=p1 AND (kglhdadr = kglhdpar) and rownum=1) END mutex_object
    from (select p1,event,count(*) cnt from v$active_session_history where
    p1text='idn' and session_state='WAITING'
    group by p1,event)
    order by cnt desc;


EVENT                       P1        CNT MUTEX_OBJECT                                                                                                                                                 
------------------- ---------- ---------- ---------------                                                                                                                                              
library cache: mute 2321070391        286 PROD                                                                                                                                                       
library cache: mute 2924456787        269 DATABASE                                                                                                                                                     
library cache: mute 3672626424        100 LB_STATUS_SEQ                                                                                                                                                
library cache: mute 1535242682         16 UA_SEQ                                                                                                                                                       
library cache: mute  370351507          5 select substrb(                                                                                                                                              
library cache: mute  950094862          4 select min(minb                                                                                                                                              
library cache: mute      57559          3 library cache b                                                                                                                                              
library cache: mute 1455413935          3 select min(minb                                                                                                                                              
library cache: mute  459016664          2 SYS_GROUP                                                                                    



And of course this mutex_stats.sql useful script by Andrey.

SQL> @mutex_stats.sql mutex_identifier=3672626424
Connected.

--------------------------------------------                                  
Statistics for "Library Cache" mutex                                           
idn: 3672626424 address  0x00000005AE8F4098                                    
Interval:   31.8 s,  gets: 1471, sleeps:4300
                                 

Requests rate:                  lambda=        46.3 Hz                         
Sleeps rate:                    omega=        135.3 Hz                         
Utilization:                    rho=    .040000                                
Exclusive Utilization:          rho_x=  .040000                                
Shared Utilization:             rho_s=  .000000                                
Avg. holding time:              S=     864.45 us                               
Service rate:                   mu=          1156.8 Hz                         
Spin inefficiency:              k=    #########                                

Secondary statistics:                                                         
Slps /Miss:               kappa=#########                                      
Spin_gets/miss:           sigma=#########                                      
correction coeff.         eta= 1.0                                             
Secondary sleeps ratio          .745                                           

--------------------------------------------                                  
Avg. number of  sessions waiting:         .02                                  

Library cache object related to  mutex idn 3672626424 :
                     

SEQUENCE: PROD.LB_STATUS_SEQ                                                  

ADDR:              TYPE        PIN/s:    LOCK/s:    EXEC/s: Namespc           
-------------------------------------------------------------------            
00000005AE8F3F68   Parent          15          0          0       1            

Mutex related parameters:
_kgl_mutex_wait_time=0 _wait_yield_hp_mode=yield    
_wait_yield_mode=yield _wait_yield_sleep_freq=100 _wait_yield_sleep_time_msecs=1
_wait_yield_yield_freq=20                                                      

PL/SQL procedure successfully completed.




***

Jul 5, 2012

SQL with high version count

I was looking at different things when investigating "library cache : mutex x" wait issue. One of the problems were SQL with high version counts. In simple terms it means - for repeated execution of the SQL, child cursor could not be reused and thus a large number of new child cursors have been generated along with the executions.


I found that, one of our SQLs had over 1600 child cursors - which is too much!

SQL> select * from (
select count(sc.sql_id) cursor_count, a.sql_id, a.sql_text
from v$sqlarea a, v$sql_shared_cursor sc
where  a.sql_id = sc.sql_id
group by a.sql_id, a.sql_text
order by cursor_count desc)
where rownum <=10; 

CURSOR_COUNT SQL_ID        SQL_TEXT
------------ ------------- ------------------------------------------------------------
        1693 2vzfs9ww6u3xw INSERT INTO admin_record (id, clnt, clnt_display
                           _name, pr_id, mar_id, mar_medication_id, medication_name,
                           prescriber_id, prescriber_display_name, dosage, total_dosage
                           , total_unit, measurement_unit, form_of_dosage, other_form_o
                           f_dosage, route, other_route, frequency, medication_type, ad
                           minister_time, scheduled_time, record_type, comments, admini
                           stered_by, administered_by_display_name, user_initial, recor
                           ded_by, recorded_by_display_name, updated_by, administer_dat
                           e, record_date, updated_date, administer_date_int, record_da
                           te_int, tz, status, test_form, version, mar_med_row_num, mar
                           _med_column_num, scheduled_entry, schede_date, scheduled_b
                           y_id) VALUES (admin_rec_seq.nextval, :1, :2, :3, :4,
                            :5, :6, :7, :8, :9, :10, :11, :12, :13, :14, :15, :16, :17,
                            :18, :19, :20, :21, :22, :23, :24, :25, :26, :27, :28, :29,
                            :30, :31, :32, :33, :34, :35, :36, :37, :38, :39, :40, :41,
                            :42)



The insert statement consumes the highest amount of shared pool memory by producing so many child cursors. This is because, we are not able to reuse cursors even though we use bind variables. During yesterday problem, the number of child cursors were 1693. Producing so many child cursors (and searching within the list) may causes shared pool mutex waits. This is an issue with 11g where database does not automatically obsolete a parent cursor "anymore" when it reaches 1024 child cursors.

The main problem with this SQL is - there are bind mismatch in most cases when the SQL comes into database for an execution with all its 42 bind variables.


SQL> select bind_mismatch, bind_length_upgradeable
from  v$sql_shared_cursor
where sql_id = '2vzfs9ww6u3xw'
and rownum <=20; 
B B
- -
Y Y
Y N
Y Y
Y Y
Y N
Y N
Y Y
Y Y
Y Y
Y N
Y Y
Y N
Y N
Y N
Y N
Y N
......

......


This should not be happening as we use bind variables. Then I looked at our application code. I found that, the developer picks column values from objects and those values could be null. I found that we need to change the jdbc prepared statement coding style and need to use "java.sql.Types" to set null values when object values are null. This way cursor will understand the bind type and won't create  child cursors for type mismatch caused by setting 'null' explicitly.


For example, for a column we use -

preparedStatement.setObject(

index++, record.getAdministeredBy() != null ? record.getAdministeredBy().getId() : null);


We need to program like -
preparedStatement.setNull(index++, java.sql.Types.INTEGER); 
when the object is null so that the cursor understand the datatype for the bind 
variable.

We are going to fix this soon in production!

Here is a nice article related with high number of child cursor problems.

http://www.usn-it.de/index.php/2010/08/04/oracle112-mutex-s-too-many-child-cursors/

***

Jul 2, 2012

Upgrading 11.2.0.2.0 to 11.2.0.3.0

As Oracle changed upgrade approach (now recommended Out-of-place upgrade), I would put those quick steps here how I did that -


1. Go to the document ID 730365.1 in My Oracle Support (MOS). Go to the link "11.2.0.x.x TO 11.2.0.3.0"

2. Download Patch:10404530

3. Download pre-upgrade utility script utlu112i_5.sql. MOS document ID 884522.1

4. Run utlu112i_5.sql as sysdba

5. Resolve the warnings

6. Gather dictionary stats with "exec dbms_stats.gather_dictionary_stats;"

7. Go to the new installation downloads and run installation

8. Specify “Install Database Software only” and point to a new oracle home – for example /oracle/app/oracle/product/11.2.0/dbhome_2

9. Run through the rest of the installation. Run the root.sh at the end as root user.

10. When finished, set the environment variable for new home -
$export ORACLE_HOME=/oracle/app/oracle/product/11.2.0/dbhome_2
$export PATH=$ORACLE_HOME/bin:$PATH

11. Now run the upgrade -
$ which dbua
$/oracle/app/oracle/product/11.2.0/dbhome_2/bin/dbua

12. Dialogue will ask which db to upgrade. Select the db and click Next

13. Verify oratab content. This may be located in /var/opt/oracle/oratab or /etc/oratab

14. Check the latest version with select * from v$version;


***