Monday, July 28, 2014

ORACLE 11g R2 RAC Clusterware Startup

ORACLE 11g R2 RAC Clusterware Startup


The restart of a UNIX server call initialization scripts to start processes and daemons. Every platform has a unique directory structure and follows a method to implement server startup sequence. In Linux platform (prior to Linux 6), initialization scripts are started by calling scripts in the /etc/rcX.d directories, where X denotes the run level of the UNIX server. Typically, Clusterware is started at run level 3. For example, ohasd daemon started by /etc/rc3.d/S96ohasd file by supplying start as an argument. File S96ohasd is linked to /etc/init.d/ohasd.
S96ohasd -> /etc/init.d/ohasd

/etc/rc3.d/S96ohasd start  # init daemon starting ohasd.

Similarly, a server shutdown will call scripts in rcX.d directories, for example, ohasd is shut down by calling K15ohasd script:
K15ohasd -> /etc/init.d/ohasd
/etc/rc3.d/K15ohasd stop  #UNIX daemons stopping ohasd

In Summary, server startup will call files matching the pattern of S* in the /etc/rcX.d directories. Calling sequence of the scripts is in the lexical order of script name. For example, S10cscape will be called prior to S96ohasd, as the script S10cscape occurs earlier in the lexical sequence.
Google if you want to learn further about RC startup sequence. Of course, Linux 6 introduces Upstart feature and the mechanism is a little different: http://en.wikipedia.org/wiki/Upstart
That’s not the whole story!
Have you ever thought why the ‘crsctl start crs’ returns immediately? You can guess that Clusterware is started in the background as the command returns to UNIX prompt almost immediately. Executing the crsctl command just modifies the ohasdrun file content to ‘restart’. It doesn’t actually perform the task of starting the clusterware. Daemon init.ohasd reads the ohasdrun file every few seconds and starts the Clusterware if the file content is changed to ‘restart’.
# cat /etc/oracle/scls_scr/oel6rac1/root/ohasdrun
restart
If you stop has using ‘crsctl stop has’ , then the ohasdstr file content is modified to stop and so, init.ohasd daemon will not restart Clusterware. However, stop command is synchronous and executes the stop of clusterware too.
# crsctl stop has
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'oel6rac1'
CRS-2673: Attempting to stop 'ora.crsd' on 'oel6rac1'
..
The content of ohasdrun is modified to stop:
# cat  /etc/oracle/scls_scr/oel6rac1/root/ohasdrun
stop # 
In a nutshell, init.ohasd daemon is monitoring the ohasdrun file and starts the Clusterware stack if the value in the file is modified to restart.
Inittab
Init.ohasd daemon is an essential daemon for Clusterware startup. Even if the Clusterware is not running on a node, you can start the Clusterware from a different node. How does that work? Init.ohasd is the reason.
The init.ohasd daemon is started from /etc/inittab. Entries in the inittab is monitored by the init daemon (pid=1) and init daemon will react if the inittab file is modified. The init daemon monitors all processes listed in the inittab file and reacts according to the configuration in the inittab file. For example, if init.ohasd fails for some reason, it is immediately restarted by init daemon.
Following is an example entry in the inittab file. Fields are separated a colon, second field indicates that init.ohasd will be started in run level 3, and the third field indicates an action field. Restart in the action field means that, if the target process exist, just continue scanning inittab file; if the target process does not exist, then restart the process.
#cat /etc/inittab
…
h1:3:respawn:/etc/init.d/init.ohasd run >/dev/null 2>&1 
If you issue a clusterware startup command from a remote node, that a message sent to init.ohasd daemon in the target node, and the daemon initates the clusterware startup. So, init.ohasd will be always running irrespective of whether the Clusterware is running or not.
You can use strace on init.ohasd to verify this behavior. Following are a few relevant lines from the output of strace command of init.ohasd process:
…
5641  1369083862.828494 open("/etc/oracle/scls_scr/oel6rac1/root/ohasdrun", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
5641  1369083862.828581 dup2(3, 1)      = 1
5641  1369083862.828606 close(3)        = 0
5641  1369083862.828631 execve("/bin/echo", ["/bin/echo", "restart"], [/* 12 vars */]) = 0
…
Just for fun!
So, what happens if I manually modify that ohasdrun to restart? I copied the ohasdrun to a temporary file (/tmp/a1.lst) and stopped the clusterware.
cp /etc/oracle/scls_scr/oel6rac1/root/ohasdrun /tmp/a1.lst 
# crsctl stop has
I verified that Clusterware is completely stopped. Now, I will copy the file again overlaying ohasdrun:
# cat /tmp/a1.lst
restart
# cp /tmp/a1.lst  /etc/oracle/scls_scr/oel6rac1/root/ohasdrun
After a minute or so, I see that Clusterware processes are started. Not that, you would use this type of hack in a Production cluster, but this test proves my point.
It’s also important not to remove the files in the scls_scr directories. Any removal of the files underneath the scls_scr directory structure can lead to an invalid configuration.
There are also two more files in the scls_scr directory structure. Ohasdstr file decides if the HAS daemon should be started automatically or not. For example, if you execute ‘crsctl disable has’, that command modifies ohasdstr file contents to ‘disable’. Similarly, crsstart file controls CRS daemon startup. Again, you should use recommended commands to control the startup, rather than modifying any of these files directly.
11.2.0.1 and HugePages
If you tried to configure hugepages in 11.2.0.1 clusterware, by increasing memlock kernel parameter for GRID and database owner, you would have realized that database doesn’t use hugepages if started by the clusterware. Database startup using sqlplus will use hugepages, but the database startup using srvctl may not use hugepages.
As the new processes are cloned from the init.ohasd daemon, until init.ohasd is restarted, user level memlock limit changes are not correctly reflected in an already running process. Only recommended way to resolve the problem is to restart the node completely (not just the clusterware), as init.ohasd daemon must be restarted to reflect the user level limits.
Version 11.2.0.2 fixes this issue by explicitly calling ulimit command from /etc/init.d/ohasd files.

Summary 

In Summary, init.ohasd process is an important process. Files underneath scls_scr directory is controlling the startup behavior. This also means that if a server is restarted, you don’t need to explicitly stop the Clusterware. You can let the server startup to restart the Clusterware.

No comments: