Jan 31, 2008

Latch Contention in Oracle

What is Latch?

Well, the hardware definition of latch is - a window or door lock. The electronic definition of latch is - an electronic circuit used to store information.

Then what is Latch in Oracle? Very uncommon as the problem is super sophisticated.

Latch is one kind of very quick (could be acquired and released in nanoseconds) lock or serialization mechanism (makes more sense) to protect Oracle’s shared memory in SGA. Basically latch protects the same area of SGA being updated by more than one process.

Now question comes, what are protected and why?

Each Oracle operation needs to read and update SGA. For example –

  1. When a query reads a block from disk, it will modify a free block in buffer cache and adjust the buffer cache LRU chain
  2. When a new SQL statement is parsed, it will be added to the library cache within SGA
  3. When DML issued and modifications are made in blocks, changes are placed in redo buffer
  4. Database writer periodically (after commit, after SCN change or after each 3 sec) writes buffers from memory to disk and updates their status from dirty to clean.
  5. Redo log writer writes blocks from redo buffer to redo logs.

Latch prevents any of these operations from colliding and possibly corrupting the SGA.

If the specific latch is already in use by another process, oracle will retry very frequently with a cumulative delay up to certain times (controlled by hidden parameter) called spin count. First time one process fails to acquire the latch, it will attempt to awaken after 10 milliseconds up to its spin count. Subsequent waits will increase in duration - might be seconds in extreme cases. This affects response time and throughput.

Most common type latch is Cache Buffer Latch and Cache buffer LRU chain latch which are caused for highly accessing blocks, called hot blocks. Contention on these latches is typically caused by concurrent access to a very hot block. The most common type of such hot block is index root or block branch.

Unfortunately we are experiencing Cache Buffer Latch contention in database now a days.

Identifying HOT BLocks

Possible hot blocks in the buffer cache normally can be identified by a high or rapid increasing wait count on the CACHE BUFFERS CHAINS latch.
This latch is acquired when searching for data blocks cached in the buffer cache. Since the Buffer cache is implemented as a sum of chains of blocks, each of those chains is protected by a child of this latch when needs to be scanned. Contention in this latch can be caused by very heavy access to a single block. This can require the application to be reviewed.

By examining the waits on this latch, information about the segment and the specific block can be obtained using the following queries.

First determine which latch id (ADDR) are interesting by examining the number of sleeps for this latch. The higher the sleep count, the more interesting the latch id (ADDR) is:

select * from (
select CHILD# "cCHILD", ADDR "sADDR",
GETS "sGETS" , MISSES "sMISSES",
SLEEPS "sSLEEPS"
from v$latch_children
where name = 'cache buffers chains'
order by 5 desc, 4, 1, 2, 3
) where rownum <=20;

Have to run the above query a few times to to establish the id (ADDR) that has the most consistent amount of sleeps. Once the id (ADDR) with the highest sleep count is found then this latch address can be used to get more details about the blocks currently in the buffer cache protected by this latch. The query below should be run just after determining the ADDR with the highest sleep count.

column segment_name format a35
select /*+ RULE */
e.owner ||'.'|| e.segment_name segment_name,
e.extent_id extent#,
x.dbablk - e.block_id + 1 block#,
x.tch,
l.child#
from
sys.v$latch_children l,
sys.x$bh x,
sys.dba_extents e
where
x.hladdr = 'ADDR' and
e.file_id = x.file# and
x.hladdr = l.addr and
x.dbablk between e.block_id and e.block_id + e.blocks -1
order by x.tch desc ;

Depending on the TCH column (The number of times the block is hit by a SQL statement), you can identify a hotblock. The higher the value of the TCH column, the more frequent the block is accessed by SQL statements.

The solution of the problem will vary depending upon the problem and application architecture.
Some generic approach might be -

1. Creation of Reverse Index of those hot blocks
2. Distribute hot blocks table on different size cache segment
3. Ensure the usages of index to reduce full scan