Friday, August 23, 2013

Oracle Latches and Mutexes

www.oracleinaction.com

LATCHES
  If there is not enough memory (no single chunk of free memory large enough) to create a new object in the library cache, Oracle uses LRU algorithm to delink objects  from their hash chains list— which means picking two existing objects that currently point to each other, setting the forward and backward pointers in the new object to point to them, and then modifying the backward pointer on one object and the forward pointer on the other object to point to the new object.
    Hash buckets               Hash chains
         H1             S1 <-> S2 <–> S5 <–> S9
                            ^                                 ^
                            |___________________|
         H2             S3 <–>  S8
                            ^           ^
                            |_______|
         H3             S4  <–>  S6
                           ^            ^
                           |_______|
         H4             S7
For example, if user issues a statement S10 which hashes to same value as bucket H2, following steps need to be taken :
- set forward pointer of S10 to S8
- set backward pointer of S10 to S3
- set forward pointer of S3 to S10
- set backward pointer of S8 to S10
so that finally hash bucket H2 looks like:
        H2             S3  <–>  S10 <–> S8
                          ^                          ^
                           |_______________|
    Consider a scenario. User1 issues statement S1, finds out that it belongs to hash bucket H1. At the same time, another user User2 needs to load another object for which the  object S2 in the bucket H1 needs to be aged out. User2 will need to delink S2 from bucket H1 for which he will
  – set forward pointer of S1 to point to S5
  – set backward pointer of S5 to point to S1
  – set forward/backward pointers of S2 so that it is attached to memory free list.
so that finally hash bucket H1 looks like:
        H2             S1 <–> S5 <–> S9
                          ^                       ^
                          |______________|
and S2 is linked to doubly linked list of free memory.
If both the users are allowed to access the contents of the hash bucket H1 at the same time, user1 will scan the doubly linked hash chain and  could end up following a pointer to an object user2 has just discarded (S2) because user2 hadn’t yet time to correct both the pointers of S2. Somehow oracle has to ensure that a user can’t modify the linked list if another user is walking it and vice versa.
  That is where oracle latches come into picture. Latches are simple, low-level serialization mechanisms to protect shared data structures in the System Global Area (SGA).Latches eliminate problems that would appear if multiple processes were allowed to modify shared memory simultaneously. A server or background process acquires a latch for a very short time while manipulating or looking at one of these structures.
  Shared Pool Latch is used to protect and achieve the concurrency in Shared Pool.  Multiple hash buckets are protected by one latch so that whenever a process wants to search for an object in a hash bucket, it has to first obtain the latch protecting the bucket, walk the linked list, do whatever it needs to do with the object if it finds it, and then drop the latch. If another user wants to walk a hash chain protected by the same latch, he will wait until the first user has dropped the latch. This gives rise to latch contention.
                             LATCH CONTENTION – LOCKS AND PINS
  Latch contention occurs because a latch is held for too long, or because demand for the latch is too high, or both.
There are three important aspects, then, of latch contention:
   • No. of different latches are covering the library cache — If number of latches are more, less no. of buckets will be protected by one latch and hence you are less likely to collide with someone who needs to walk a linked list covered by the latch you want; on the other hand, the more latches you have, the more work you may have to do in some form of maintenance, reporting, or garbage collection.
   In  shared pool architecture  there are a fixed number of hash buckets (that grow to the next level as needed) protected by a fixed number of library cache latches .
 Until 10g the number of latches covering the library cache was remarkably small. The number was dependent on the number of CPUs (it’s roughly the same as the cpu_count parameter) up to a maximum of 67 latches. That’s a surprisingly small number, really, given the potential for collisions occurring on even a small number of frequently executed statements; and two processes don’t have to be executing the same SQL, or even accessing the same hash bucket, to be colliding on the same latch—they need only be accessing two hash buckets covered by the same latch.
  • No. of times you need to go through the process of acquiring a given latch —  The more times you have to get the latch and walk the list, the more likely you are to collide with someone who wants the same latch.
  We can minimize the number of times we have to search the library cache for an object by attaching a KGL lock to an object once we’ve found it so that we have a shortcut to it (Stored in PGA as open/closed cursor). Next time the same statement is issued, we  will find it in the PGA itself and need not search hash chains for the object again (Softer parse).
• Time for which you have to hold the latch - The longer everyone holds latches, the more likely they are to have problems with other people holding latches that they want to acquire.
 We ought to avoid holding latches for any length of time. This means that if we need to do something time-consuming when we’ve found the bit of memory we were searching for, we can pin the object  to protect that memory while we’re using it so that we can release the latch.
   So, after the server process has found out the corresponding hash bucket, the steps followed are :
1. Obtain latch on the corresponding hash bucket.
2. Obtain lock on the object so that pointer to the object is placed in the PGA (as Open cursor).
3. Pin the object and release the latch.
4. Do whatever you want to do with the object (e.g. execute a statement/procedure).
5. Obtain the latch, unpin the object and release the latch.
Locks and Pins are usually in share mode unless modifications are being made.
          HOW ARE LIBRARY CACHE LOCKS (KGL LOCKS) OBTAINED?
There are three major ways in which you can obtain KGL locks :
• You can write your front-end code to “hold” cursors that you know you are going to use frequently.
• You can set the session_cached_cursors parameter so that Oracle library code will automatically start holding a cursor if it sees you using a statement more than two or three times.
• You can benefit from the semiautomatic way in which PL/SQL will hold cursors that have been opened (explicitly or implicitly) from within a PL/SQL call—from Oracle 9.2.0.5 onward, setting the session_cached_cursors parameter will also control this feature.The session_cached_cursors parameter dictates how many cursors can be held when your code does not explicitly hold cursors. It also controls the number of cursors that can be held open by PL/SQL code being run by the session.
                                        BENEFITS OF KGL LOCKS
  Besides minimizing the number of times we have to search the library cache for an object, locks provide additional benefits:
-  By means of locks, one client can prevent other clients from accessing the same object. Library cache locks use object handles as resource structures and locks are taken on that resource. If the resources are not available in a compatible mode, then sessions must wait for library cache objects to be available.
-  Library cache locks aka parse locks are needed to maintain dependency mechanism between objects and their dependent objects. For example, if an object (e.g. table) definition needs to be modified then dependent objects (e.g. execution plan of SQL statement referencing the table) must be invalidated. This dependency is maintained using library cache locks. For example, if a column is dropped from a table then all SQLs dependent upon the table must be invalidated and reparsed during next access to that object. Library cache locks are designed to implement this tracking mechanism. If we have lock on a SQL statement, we have pointer to its execution plan stored in PGA. If the structure of an underlying table is modified, the parse lock on the statement will be broken and the cursor in PGA will be invalidated i.e. next time we re-excute the same statement, it will be searched in the library cache and will subsequently be hard parsed.
                       WHY ARE LIBRARY CACHE PINS (KGL PINS) NEEDED
  The KGL pin comes into play when you actually use an object. Although a KGL lock will hold an object in memory(cursor in PGA pointing to the object in shared pool), there are parts of the object that are dynamically re-creatable (the execution plan for an SQL statement, for example), and these can still be discarded (if memory is required in shared pool, the memory location to which the cursor in PGA is pointing may have some other object there)  if there is a heavy demand for memory even if you have a KGL lock in place.
However, when you are actually using an object (running an SQL statement, say), you need to ensure that the re-creatable bits can’t be pushed out of memory, so you pin the object to protect them.
Pinning an object causes the heaps to be loaded into memory. If a client wants to modify or examine  the object, he must acquire a pin after the lock.
 For example, an underlying object should not be modified when a session is executing or accessing a dependent object (SQL).If a user tried to modify the underlying object (table structure for example) he will need to acquire library cache pin in exclusive mode on the table. If a dependent SQL is executing, the pins won’t be available and there will be waits for library cache pins. So, before parse locks on a library cache object can be broken, library cache pins must be acquired in Exclusive mode and then only library cache objects can be altered. Typically, this happens for long running SQL statement.
Library cache locks and pins are externalized in three x$ tables i.e. x$kgllk, x$kglpn and x$kglob.
 . x$kgllk is externalizing all locking structures on an object.
 . Entries in x$kglob acts as a resource structure.
 . x$kglpn is externalizing all library cache pins.
                             ISSUES WITH KGL LOCKS AND PINS
 The mechanisms of locking and pinning in the library cache allowed two problems to creep in.
 The KGL locks and KGL pins themselves are simply little packets of memory that, at one time, were individually created and discarded on demand by making calls to allocate memory from the shared pool. Since a KGL lock is about 200 bytes and a KGL pin is about 40 bytes, you can imagine that between them the constant allocation and freeing of memory could cause the free memory in the shared pool to end up in a “honey-combed” state—i.e., lots of free memory in total, but not many large pieces of contiguous memory. The KGL pins were particularly nasty because they would come and go very quickly; the KGL locks weren’t quite so bad because they could stay attached to an object for some time.
 Another issue with the KGL locks and KGL pins was that, to use them, you had to constantly create a packet of memory, label it properly, and insert it into a linked list (or, conversely, to remove a link from a linked list and put it back in the shared pool)and you had to do this while holding a latch exclusively . So for very busy systems, the whole lock/pin issue could become a significant scalability threat.
  In 10g, Oracle Corp. introduced the library cache lock latch and the library cache pin latch, which allowed some concurrent activity to be done on different hash buckets covered by the same library cache latch (you could pin a cursor in one bucket while I locked a cursor in another bucket because we wouldn’t both need to hold the same library cache latch at the same time).
However, as we moved through 10g to 11g, the whole KGL lock/KGL pin mechanism was gradually replaced by the mutex mechanism.
                                                    MUTEXES
  To improve cursor execution and also hard parsing, a new fine-grained memory serialization mechanism has was created in 10gR2. For certain shared-cursor related operations, mutexes (Mutual exclusion objects) were used as a replacement for library cache latches and library cache pins. Basically, it works same way as a latch does, but the code path for operating mutexes is shorter and hence it is lightweight and  often directly hardware-supported. Thus, using mutexes is faster, uses less CPU and also allows significantly improved concurrency over the latch mechanism. On 32bit linux installation, regular latch structure is 110 bytes. whereas a mutex is 28 bytes in size. ALso, mutexes take less instructions. It takes  150-200 instructions to get a latch but a mutex get is about 30-35 instructions.
 The size of mutex structure is smaller becauses they give you less information about who is waiting for what and for how long. You have information about sleep times, but not number of requests and misses.
BENEFITS OF MUTEX:
They can be taken in shared or exclusive mode and also getting a mutex can de done in wait or no-wait mode.
1. Mutexes have less potential for false contention. Previously multiple hash buckets were protected by the same latch. Hence, two users searching two different buckets protected by the same latch were prone to unnecessary false contention i.e. contention used to be for the protection mechanism (latch) rather than the target object you are attempting to access. Unlike latches, with mutexes it is possible to create a mutex for each structure protected. This means false contention is much less likely because each structure can be protected by its own mutex. Now each parent and child cursor has its own mutex, and we do not have to contend for a latch that’s co-used by many cursors.
2. Replace latches and pins : Mutexes have a dual nature; they can act as a serialization mechanism (as a latch) and also as a pin (for example, preventing an object from aging out). Whereas a latch can’t be obtained by multiple sessions, a mutex can be eferenced by many sessions, providing that all sessions reference the mutex in shared mode. The total no. of sessions referencing a mutex in shared (S) mode is called reference count (ref count). The ref count for a mutex is stored in the mutex itself. A mutex can also be held in exclusive(X) mode by one session only.
  The ref count of a mutex is a replacement for a library cache pin. In case of latches, whenever a session executed a statement, it first created and then deleted a library cache pin but with mutexes each session increments and decrements the ref count ( so the ref countreplaces n distinct pins). It means that an object can’t be aged out until its refcount drops to zero i.e. no user is currently accessing the object.
3. The mutex structure is located in each child cursor handle and the mutex itself acts as cursor pin structure. To change the cursor pin status, earlier we needed to get the library cache latch but now we can modify the cursor’s mutex refcount directly.
As per Oracle Guru Tanel Podar :But the main scalability benefit comes from that there’s a mutex structure in each child cursor handle and the mutex itself acts as cursor pin structure. So if you have a cursor open (or cached in session cursor cache) you don’t need to get the library cache latch (which was previously needed for changing cursor pin status), but you can modify the cursor’s mutex refcount directly (with help of pointers in open cursor state area in sessions UGA).

Therefore you have much higher scalability when pinning/unpinning cursors (no library cache latching needed, virtually no false contention) and no separate pin structures need to be allocated/maintained.

Few notes:
1) library cache latching is still needed for parsing etc, the mutexes address only the pinning issue in library cache
2) mutexes are currently used for library cache cursors (not other objects like PL/SQL stored procs, table defs etc)
3) As mutexes are a generic mechanism (not library cache specific) they’re used in V$SQLSTATS underlying structures too
4) When mutexes are enabled, you won’t see cursor pins from X$KGLPN anymore (as X$KGLPN is a fixed table based on the KGL pin array – which wouldn’t be used for cursors anymore)
 Note that latches and mutexes are independent mechanisms i.e. a process can hold a latch and a mutex at thye same time.In Oracle 10.2.0.2+ the library cache pin latch usage was replaced with mutexes whenever _kks_use_mutex_pin was true, also few other things like V$SQLSTATS arrays and parent cursor examination were protected by mutexes. However(the right child cursor lookup using kksfbc()) was still protected by library cache latches which could become a problem with frequent soft parsing combined with too little cursor cache and long library cache hash chains (remember, the library cache latches were always taken exclusively even for plain hash chain scanning).
In 10g you see there are 3 types of mutexes used:
Cursor Stat
Cursor Parent
Cursor Pin
  So 11.2 got rid of all the little packets of memory and the linked lists — in fact, it even got rid of the latches used to protect the hash chains, and replaced them all with mutexes. Each hash bucket is protected by a mutex. If you have mutexes on hash buckets, you have one micro-latch per bucket instead of a maximum of 67 latches covering 131,072 BUCKETS. If you have a mutex on each library cache object to represent the KGL locks and another to represent the KGL pins, then you don’t have to do all that memory allocation and deallocation and don’t have to run code to connect things to linked lists. So, starting from 11g, each library cache bucket is protected by a separate mutex (yes all 131072 of them!).
 In 11g there are couple additional mutexes, one (and most important) of them is Library Cache mutex.
In 11g all library cache related latches except “library cache load lock” are gone and corresponding operations are protected by mutexes instead. The “library cache” latches have been replaced by “Library Cache” mutexes.
The following latches which were present in 10g are no longer seen in 11g:
library cache pin allocation
library cache lock allocation
library cache hash chains
library cache lock
library cache
library cache pin
The only library cache latch present in 11g is :
library cache load lock
The following wait events related to library cache which were present in 10g but are no longer seen in 11g:
latch: library cache
latch: library cache lock
latch: library cache pin
The wait events related to library cache present in 11g are:
library cache pin
library cache lock
library cache load lock
library cache: mutex X
library cache: mutex S
OSD IPC library
library cache revalidation
library cache shutdown
 Of course the library cache mutexes still don’t solve all problems in the world (especially the ones related to excessive hard parsing!), for example there’s still a chance of hash collision of two entirely different cursors. Also, if there are many child cursors under a parent and the application cursor management is poor (e.g. cursors are closed after every execution and no session cursor caching is done) then you could still have contention on the mutex due to continuous library cache hash chain traversing.
A demonstration of library cache locks/pins can be seen here.
In my next article in this series on shared pool, I will give a quick overview of shared pool tuning.
I hope this information was useful. Your comments and suggestions are always welcome.
References:
Oracle Core : Essential Internals  for DBA’s and Developers By Jonathan Lewis

No comments: