Sunday, March 30, 2014

Oracle 11G R2 RAC Quick Ref commands for a DBA

11G R2 RAC  Quick Ref for a DBA

Commonly used Commands
*************************************************************************
su - grid -c "crs_stat -t -v"
su - grid -c "crsctl status resource -w \"TYPE co 'ora'\" -t"

srvctl start database -d stgtdb
srvctl start database -d stgrat
srvctl start database -d stgrptdb
srvctl status instance -d racdb -i racdb1
srvctl status nodeapps
srvctl config nodeapps
srvctl config database
srvctl config database -d racdb -a
srvctl status asm
srvctl config asm -a
srvctl config listener -a
srvctl status scan
srvctl config scan
srvctl status vip -n racnode1
srvctl status vip -n racnode2
srvctl config vip -n racnode1
srvctl config vip -n racnode2
srvctl config nodeapps -a -g -s -l
srvctl stop database -d racdb
srvctl start database -d racdb

crsctl check cluster
crsctl stop cluster
crsctl stop cluster -all
crsctl start cluster
crsctl start cluster -all
crsctl start cluster -n racnode1 racnode2

cluvfy comp scan -verbose
cluvfy comp clocksync -verbose




SELECT
    inst_id
  , instance_number inst_no
  , instance_name inst_name
  , parallel
  , status
  , database_status db_status
  , active_state state
  , host_name host
FROM gv$instance
ORDER BY inst_id;

select name from v$datafile
union
select member from v$logfile
union
select name from v$controlfile
union
select name from v$tempfile;


Checking CRS Status:

The below two commands are generally used to check the status of CRS. The first command lists the status of CRS
on the local node where as the other command shows the CRS status across all the nodes in Cluster.

crsctl check crs <<-- br="" for="" local="" node="" the="">crsctl check cluster <<-- br="" cluster="" for="" in="" nodes="" remote="" the="">
[root@node1-pub ~]# crsctl check crs
Cluster Synchronization Services appears healthy
Cluster Ready Services appears healthy
Event Manager appears healthy
[root@node1-pub ~]#

Checking Viability of CSS across nodes:

crsctl check cluster

For this command to run, CSS needs to be running on the local node. The "ONLINE" status for remote node says that CSS is running on that node.
When CSS is down on the remote node, the status of "OFFLINE" is displayed for that node.

[root@node1-pub ~]# crsctl check cluster
node1-pub    ONLINE
node2-pub    ONLINE

Viewing Cluster name:

I use below command to get the name of Cluster. You can also dump the ocr and view the name from the dump file.

ocrdump -stdout -keyname SYSTEM | grep -A 1 clustername | grep ORATEXT | awk '{print $3}'

[root@node1-pub ~]# ocrdump -stdout -keyname SYSTEM | grep -A 1 clustername | grep ORATEXT | awk '{print $3}'
test-crs
[root@node1-pub ~]#

OR

ocrconfig -export /tmp/ocr_exp.dat -s online
for i in `strings /tmp/ocr_exp.dat | grep -A 1 clustername` ; do if [ $i != 'SYSTEM.css.clustername' ]; then echo $i; fi; done


[root@node1-pub ~]# ocrconfig -export /tmp/ocr_exp.dat -s online
[root@node1-pub ~]# for i in `strings /tmp/ocr_exp.dat | grep -A 1 clustername` ; do if [ $i != 'SYSTEM.css.clustername' ]; then echo $i; fi; done
test-crs
[root@node1-pub ~]#

OR

Oracle creates a directory with the same name as Cluster under the $ORA_CRS_HOME/cdata. you can get the cluster name from this directory as well.

[root@node1-pub ~]# ls /u01/app/crs/cdata
localhost  test-crs

Viewing No. Of Nodes configured in Cluster:

The below command can be used to find out the number of nodes registered into the cluster.
It also displays the node's Public name, Private name and Virtual name along with their numbers.

olsnodes -n -p -i

[root@node1-pub ~]# olsnodes -n -p -i
node1-pub       1       node1-prv       node1-vip
node2-pub       2       node2-prv       node2-vip

Viewing Votedisk Information:

The below command is used to view the no. of Votedisks configured in the Cluster.

crsctl query css votedisk

[root@node1-pub ~]# crsctl query css votedisk
0.     0    /u02/ocfs2/vote/VDFile_0
1.     0    /u02/ocfs2/vote/VDFile_1
2.     0    /u02/ocfs2/vote/VDFile_2
Located 3 voting disk(s).
[root@node1-pub ~]#

[root@node1-pub ~]# crsctl check crs
Cluster Synchronization Services appears healthy
Cluster Ready Services appears healthy
Event Manager appears healthy
[root@node1-pub ~]#

Viewing OCR Disk Information:

The below command is used to view the no. of OCR files configured in the Cluster. It also displays the version of OCR
as well as storage space information. You can only have 2 OCR files at max.

ocrcheck

[root@node1-pub ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version                  :          2
Total space (kbytes)     :     262120
Used space (kbytes)      :       3848
Available space (kbytes) :     258272
ID                       :  744414276
Device/File Name         : /u02/ocfs2/ocr/OCRfile_0
Device/File integrity check succeeded
Device/File Name         : /u02/ocfs2/ocr/OCRfile_1
Device/File integrity check succeeded

Cluster registry integrity check succeeded

Various Timeout Settings in Cluster:

Disktimeout:
Disk Latencies in seconds from node-to-Votedisk. Default Value is 200. (Disk IO)
Misscount:
Network Latencies in second from node-to-node (Interconnect). Default Value is 60 Sec (Linux) and 30 Sec in Unix platform. (Network IO)
Misscount < Disktimeout

NOTE: Do not change them without contacting Oracle Support. This may cause logical corruption to the Data.

IF
(Disk IO Time > Disktimeout) OR (Network IO time > Misscount)
THEN
REBOOT NODE
ELSE
DO NOT REBOOT
END IF;

crsctl get css disktimeout
crsctl get css misscount
crsctl get css  reboottime

[root@node1-pub ~]# crsctl get css disktimeout
200

[root@node1-pub ~]# crsctl get css misscount
Configuration parameter misscount is not defined. <<<<< This message indicates that the Misscount is not set maually and it is set to it's
Default Value On Linux, it is default to 60 Second. If you want to chang it, you can do that as below. (Not recommended)

[root@node1-pub ~]# crsctl set css misscount 100
Configuration parameter misscount is now set to 100.
[root@node1-pub ~]# crsctl get css misscount
100

The below command sets the value of misscount back to its Default values:

crsctl unset css misscount

[root@node1-pub ~]# crsctl unset css misscount

[root@node1-pub ~]# crsctl get css  reboottime
3

Add/Remove OCR file in Cluster:

Removing OCR File

(1) Get the Existing OCR file information by running ocrcheck utility.

[root@node1-pub ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version                  :          2
Total space (kbytes)     :     262120
Used space (kbytes)      :       3852
Available space (kbytes) :     258268
ID                       :  744414276
Device/File Name         : /u02/ocfs2/ocr/OCRfile_0 <-- br="" ocr="">Device/File integrity check succeeded
Device/File Name         : /u02/ocfs2/ocr/OCRfile_1 <-- br="" mirror="" ocr="">Device/File integrity check succeeded

Cluster registry integrity check succeeded

(2) The First command removes the OCR mirror (/u02/ocfs2/ocr/OCRfile_1). If you want to remove the OCR
file (/u02/ocfs2/ocr/OCRfile_1) run the next command.

ocrconfig -replace ocrmirror
ocrconfig -replace ocr

[root@node1-pub ~]# ocrconfig -replace ocrmirror
[root@node1-pub ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version                  :          2
Total space (kbytes)     :     262120
Used space (kbytes)      :       3852
Available space (kbytes) :     258268
ID                       :  744414276
Device/File Name         : /u02/ocfs2/ocr/OCRfile_0 <<-- br="" file="" ocr="">Device/File integrity check succeeded

Device/File not configured  <-- any="" br="" existed="" mirror="" more="" not="" ocr="">
Cluster registry integrity check succeeded

Adding OCR

You need to add OCR or OCR Mirror file in a case where you want to move the existing OCR file location to the different Devices.
The below command add ths OCR mirror file if OCR file alread exists.

(1) Get the Current status of OCR:

[root@node1-pub ~]# ocrconfig -replace ocrmirror
[root@node1-pub ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version                  :          2
Total space (kbytes)     :     262120
Used space (kbytes)      :       3852
Available space (kbytes) :     258268
ID                       :  744414276
Device/File Name         : /u02/ocfs2/ocr/OCRfile_0 <<-- br="" file="" ocr="">Device/File integrity check succeeded

Device/File not configured  <-- br="" does="" exist="" mirror="" not="" ocr="">
Cluster registry integrity check succeeded

As You can see, I only have one OCR file but not the second file which is OCR Mirror.
So, I can add second OCR (OCR Mirror) as below command.

ocrconfig -replace ocrmirror

[root@node1-pub ~]# ocrconfig -replace ocrmirror /u02/ocfs2/ocr/OCRfile_1
[root@node1-pub ~]# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version                  :          2
Total space (kbytes)     :     262120
Used space (kbytes)      :       3852
Available space (kbytes) :     258268
ID                       :  744414276
Device/File Name         : /u02/ocfs2/ocr/OCRfile_0
Device/File integrity check succeeded
Device/File Name         : /u02/ocfs2/ocr/OCRfile_1
Device/File integrity check succeeded

Cluster registry integrity check succeeded

You can have at most 2 OCR devices (OCR itself and its single Mirror) in a cluster. Adding extra Mirror gives you below error message

[root@node1-pub ~]# ocrconfig -replace ocrmirror /u02/ocfs2/ocr/OCRfile_2
PROT-21: Invalid parameter
[root@node1-pub ~]#

Add/Remove Votedisk file in Cluster:

Adding Votedisk:

Get the existing Vote Disks associated into the cluster. To be safe, Bring crs cluster stack down on all the nodes
but one on which you are going to add votedisk from.

(1) Stop CRS on all the nodes in cluster but one.

[root@node2-pub ~]# crsctl stop crs

(2) Get the list of Existing Vote Disks

crsctl query css votedisk

[root@node1-pub ~]# crsctl query css votedisk
0.     0    /u02/ocfs2/vote/VDFile_0
1.     0    /u02/ocfs2/vote/VDFile_1
2.     0    /u02/ocfs2/vote/VDFile_2
Located 3 voting disk(s).

(3) Backup the VoteDisk file

Backup the existing votedisks as below as oracle:

dd if=/u02/ocfs2/vote/VDFile_0 of=$ORACLE_BASE/bkp/vd/VDFile_0

[root@node1-pub ~]# su - oracle
[oracle@node1-pub ~]$ dd if=/u02/ocfs2/vote/VDFile_0 of=$ORACLE_BASE/bkp/vd/VDFile_0
41024+0 records in
41024+0 records out
[oracle@node1-pub ~]$

(4) Add an Extra Votedisk into the Cluster:

If it is a OCFS, then touch the file as oracle. On raw devices, initialize the raw devices using "dd" command

touch /u02/ocfs2/vote/VDFile_3 <<-- as="" br="" oracle="">crsctl add css votedisk /u02/ocfs2/vote/VDFile_3 <<-- as="" br="" oracle="">crsctl query css votedisks

[root@node1-pub ~]# su - oracle
[oracle@node1-pub ~]$ touch /u02/ocfs2/vote/VDFile_3
[oracle@node1-pub ~]$ crsctl add css votedisk /u02/ocfs2/vote/VDFile_3
Now formatting voting disk: /u02/ocfs2/vote/VDFile_3.
Successful addition of voting disk /u02/ocfs2/vote/VDFile_3.

(5) Confirm that the file has been added successfully:

[root@node1-pub ~]# ls -l /u02/ocfs2/vote/VDFile_3
-rw-r-----  1 oracle oinstall 21004288 Oct  6 16:31 /u02/ocfs2/vote/VDFile_3
[root@node1-pub ~]# crsctl query css votedisks
Unknown parameter: votedisks
[root@node1-pub ~]# crsctl query css votedisk
0.     0    /u02/ocfs2/vote/VDFile_0
1.     0    /u02/ocfs2/vote/VDFile_1
2.     0    /u02/ocfs2/vote/VDFile_2
3.     0    /u02/ocfs2/vote/VDFile_3
Located 4 voting disk(s).
[root@node1-pub ~]#

Removing Votedisk:

Removing Votedisk from the cluster is very simple. Tthe below command removes the given votedisk from cluster configuration.

crsctl delete css votedisk /u02/ocfs2/vote/VDFile_3

[root@node1-pub ~]# crsctl delete css votedisk /u02/ocfs2/vote/VDFile_3
Successful deletion of voting disk /u02/ocfs2/vote/VDFile_3.
[root@node1-pub ~]#

[root@node1-pub ~]# crsctl query css votedisk
0.     0    /u02/ocfs2/vote/VDFile_0
1.     0    /u02/ocfs2/vote/VDFile_1
2.     0    /u02/ocfs2/vote/VDFile_2
Located 3 voting disk(s).
[root@node1-pub ~]#

Backing Up OCR

Oracle performs physical backup of OCR devices every 4 hours under the default backup direcory $ORA_CRS_HOME/cdata/
and then it rolls that forward to Daily, weekly and monthly backup. You can get the backup information by executing below command.

ocrconfig -showbackup

[root@node1-pub ~]# ocrconfig -showbackup

node2-pub     2007/09/03 17:46:47     /u01/app/crs/cdata/test-crs/backup00.ocr

node2-pub     2007/09/03 13:46:45     /u01/app/crs/cdata/test-crs/backup01.ocr

node2-pub     2007/09/03 09:46:44     /u01/app/crs/cdata/test-crs/backup02.ocr

node2-pub     2007/09/03 01:46:39     /u01/app/crs/cdata/test-crs/day.ocr

node2-pub     2007/09/03 01:46:39     /u01/app/crs/cdata/test-crs/week.ocr
[root@node1-pub ~]#

Manually backing up the OCR

ocrconfig -manualbackup <<--physical backup="" br="" ocr="" of="">
The above command backs up OCR under the default Backup directory. You can export the contents of the OCR using below command (Logical backup).

ocrconfig -export /tmp/ocr_exp.dat -s online <<-- backup="" br="" logical="" ocr="" of="">
Restoring OCR

The below command is used to restore the OCR from the physical backup. Shutdown CRS on all nodes.

ocrconfig -restore

Locate the avialable Backups

[root@node1-pub ~]# ocrconfig -showbackup

node2-pub     2007/09/03 17:46:47     /u01/app/crs/cdata/test-crs/backup00.ocr

node2-pub     2007/09/03 13:46:45     /u01/app/crs/cdata/test-crs/backup01.ocr

node2-pub     2007/09/03 09:46:44     /u01/app/crs/cdata/test-crs/backup02.ocr

node2-pub     2007/09/03 01:46:39     /u01/app/crs/cdata/test-crs/day.ocr

node2-pub     2007/09/03 01:46:39     /u01/app/crs/cdata/test-crs/week.ocr

node1-pub     2007/10/07 13:50:41     /u01/app/crs/cdata/test-crs/backup_20071007_135041.ocr

Perform Restore from previous Backup

[root@node2-pub ~]# ocrconfig -restore /u01/app/crs/cdata/test-crs/week.ocr

The above command restore the OCR from week old backup.
If you have logical backup of OCR (taken using export option), then You can import it with the below command.

ocrconfig -import /tmp/ocr_exp.dat

Restoring Votedisks

Shutdown CRS on all the nodes in Cluster.
Locate the current location of the Votedisks
Restore each of the votedisks using "dd" command from the previous good backup of Votedisk taken using the same "dd" command.
Start CRS on all the nodes.

crsctl stop crs
crsctl query css votedisk
dd if= of= <<-- all="" br="" do="" for="" the="" this="" votedisks="">crsctl start crs

Changing Public and Virtual IP Address:


Current Config                                               Changed to

Node 1:

Public IP:       216.160.37.154                              192.168.10.11
VIP:             216.160.37.153                              192.168.10.111
subnet:          216.160.37.159                              192.168.10.0
Netmask:         255.255.255.248                             255.255.255.0
Interface used:  eth0                                        eth0
Hostname:        node1-pub.hingu.net                         node1-pub.hingu.net

Node 2:

Public IP:       216.160.37.156                              192.168.10.22
VIP:             216.160.37.157                              192.168.10.222
subnet:          216.160.37.159                              192.168.10.0
Netmask:         255.255.255.248                             255.255.255.0
Interface used:  eth0                                        eth0
Hostname:        node1-pub.hingu.net                         node2-pub.hingu.net


Take the Services, Database, ASM Instances and nodeapps down on both the Nodes in Cluster.
Also disable the nodeapps, asm and database instances to prevent them from restarting in case
if this node gets rebooted during this process.

srvctl stop service -d test
srvctl stop database -d test
srvctl stop asm -n node1-pub
srvctl stop asm -n node2-pub
srvctl stop nodeapps -n node1-pub,node1-pub2
srvctl disable instance -d test -i test1,test2
srvctl disable asm -n node1-pub
srvctl disable asm -n node2-pub
srvctl disable nodeapps -n node1-pub
srvctl disable nodeapps -n node2-pub

Modify the /etc/hosts and/or DNS, ifcfg-eth0 (local node) with the new IP values
on All the Nodes

Restart the specific network interface in order to use the new IP.

ifconfig eth0 down
ifconfig eth0 up

Or, you can restart the network.
CAUTION: on NAS, restarting entire network may cause the node to be rebooted.

Update the OCR with the New Public IP.
In case of public IP, you have to delete the interface first and then add it back with the new IP address.

As oracle user, Issue the below command:

oifcfg delif -global eth0
oifcfg setif -global eth0/192.168.10.0:public

Update the OCR with the New Virtual IP.
Virtual IP is part of the nodeapps and so you can modify the nodeapps to update the Virtual IP information.

As privileged user (root), Issue the below commands:

srvctl modify nodeapps -n node1-pub -A 192.168.10.111/255.255.255.0/eth0 <-- 1="" br="" for="" node="">srvctl modify nodeapps -n node1-pub -A 192.168.10.222/255.255.255.0/eth0 <-- 2="" br="" for="" node="">
Enable the nodeapps, ASM, database Instances for all the Nodes.

srvctl enable instance -d test -i test1,test2
srvctl enable asm -n node1-pub
srvctl enable asm -n node2-pub
srvctl enable nodeapps -n node1-pub
srvctl enable nodeapps -n node2-pub

Update the listener.ora file on each nodes with the correct IP addresses in case if it uses the IP address instead of the hostname.

Restart the Nodeapps, ASM and Database instance

srvctl start nodeapps -n node1-pub
srvctl start nodeapps -n node2-pub
srvctl start asm -n node1-pub
srvctl start asm -n node2-pub
srvctl start database -d test



RVCTL is used to manage the following resources (components):
***************************************************************

Component             Abbreviation                 Description
***********************************************************************
asm                            asm                                    Oracle ASM instance
database                  db                                       Database instance
diskgroup                dg                                       Oracle ASM disk group
filesystem                filesystem                       Oracle ASM file system
home                         home                                Oracle home or Oracle Clusterware home
listener                     lsnr                                    Oracle Net listener
service                     serv                                    Database service
ons, eons                ons, eons                          Oracle Notification Services (ONS)
***********************************************************************


Oracle entities (such as resources, resource types, and server pools) that have names beginning with ora are managed only by SRVCTL (and not by CRSCTL) unless you are directed to do so by Oracle Support. The cluster specific commands are generally managed by CRSCTL.

Syntax:

srvctl  command object options

The available commands used with SRVCTL are:

Command           Description
add               Adds a component to the Oracle Restart configuration.
config            Displays the Oracle Restart configuration for a component.
disable           Disables management by Oracle Restart for a component.
enable            Reenables management by Oracle Restart for a component.
getenv            Displays environment variables in the Oracle Restart configuration for a database, Oracle ASM instance, or listener.
modify            Modifies the Oracle Restart configuration for a component.
remove            Removes a component from the Oracle Restart configuration.
setenv            Sets environment variables in the Oracle Restart configuration for a database, Oracle ASM instance, or listener.
start             Starts the specified component.
status            Displays the running status of the specified component.
stop              Stops the specified component.
unsetenv          Unsets environment variables in the Oracle Restart configuration for a database, Oracle ASM instance, or listener.

For more information go to the Oracle DBA Guide

Here are a matrix of commands/ object combination:

Commands     Objects     Comment

srvctl add
srvctl modify
srvctl remove
   

instance

database
service
nodeapps
   
The OCR is modified.

srvctl relocate

service

You can reallocate a service from one named instance to another named instance.

srvctl start
srvctl stop
srvctl status

instance
database
service
asm
nodeapps
   



srvctl disable
srvctl enable
   

instance

database
service
asm
   

enable = when the server restart the resource must be restarted
disable = when the server restart the resource must NOT be restarted
              (perhaps we are working for some maintenance tasks)

srvctl config
database
service
asm


nodeapps
   
Lists configuration information from the OCR (Oracle Cluster Registry).

srvctl getenv
srvctl setenv
srvctl unsetenv
   
instance

database
service
nodeapps

srvctl getenv = displays the environment variables stored in the OCR for target.
srvctl setenv    = allows these variables to be set
srvctl unsetenv = llows these variables to be unset


The most  SRVCTL commands are:

srvctl start database -d DBname
srvctl stop database -d DBname


If you don't know the DBname you might run  select name from v$database;


srvctl start instance -d DBname -i INSTANCEname
srvctl stop instance -d DBname -i INSTANCEname

srvctl start instance -d DBname -i INSTANCEname
srvctl stop instance -d DBname -i INSTANCEname

srvctl status database -d DBname
srvctl status instance -d DBname -i INSTANCEname
srvctl status nodeapps -n NODEname

srvctl enable database -d DBname
srvctl disable database -d DBname

srvctl enable instance -d DBname -i INSTANCEname
srvctl disable instance -d DBname -i INSTANCEname


srvctl config database -d DBname      -> to get some information about the database from OCR.
srvctl getenv nodeaps

How to Kill Oracle sessions

Here I am giving the procedurehow to kill Oracle Sessions using SQL* Plus, Using Kill Command ( In Unix/Linux ) and orakill Commands ( Windows NT). 
(a) SQL* PLUS:       
In order to use SQL*PLUS Approach, first we need to find out the SID, SERIAL# of the Session which we want to kill. Then issue the following command to kill the session 
SQL> alter system kill session ‘SID, SERIAL#’; 
 When we issue the above statement, It terminates a session, rolls back ongoing transactions, releases all session locks, frees all session resources.  If the session is performing some activity that must be completed (e.g. waiting for a reply from a remote database or rolling back a transaction), Oracle waits for this activity to complete, kills the session then returns control. If the wait lasts for 60 seconds then Oracle marks the session to be killed, and returns control with a message that the session is marked to be killed. It then gets killed when the activity is complete.
  (b) Using KILL Command (Unix/Linux)     
    To kill the sessions using KILL Command, we need to find out the SPID ( Server Process ID) of the Oracle Session. Then issue the KILL Command
 $ kill -9  
(c) Using ORAKILL Command ( Windows)      
To kill the sessions using ORAKILL Command ( Windows), we need to find out the SPID of Session and ORACLE_SID of your Oracle Database. Then issue ORAKILL Command 
C:\> orakill    
Here find the practical approach 
 Using SQL*PLUS 
 SQL> select username,sid,serial#,terminal from v$session; 
USERNAME    SID    SERIAL# TERMINAL
—————————— ———- ————-
SCOTT                           134      47747 BSR       
SYS                                   147      46965 pts/0                     
                                          150          2 UNKNOWN                      
                                          151          4 UNKNOWN                              
                                          154          1 UNKNOWN                                      
                                          167          1 UNKNOWN                                
                                          168          1 UNKNOWN                              
                                          169          1 UNKNOWN                                  
                                          170          1 UNKNOWN 
18 rows selected. 
SQL> alter system kill session ’134,47747′ immediate; 
System altered.
================================================================ 
Using Unix KILL Command:
 ============================== 
SQL> SELECT s.sid,       p.spid,       s.osuser,       s.programFROM   v$process p,       v$session sWHERE  p.addr = s.paddr;
SID SPID         OSUSER                         PROGRAM
———- ———— ——————————     
170 15771        oracle                         oracle@oraprod (PMON)      
 169 15773        oracle                         oracle@oraprod (PSP0)    
 168 15775        oracle                         oracle@oraprod (MMAN)      
167 15777        oracle                         oracle@oraprod (DBW0)      
 166 15779        oracle                         oracle@oraprod (LGWR)    
 165 15781        oracle                         oracle@oraprod (CKPT)     
 164 15783        oracle                         oracle@oraprod (SMON)    
 163 15785        oracle                         oracle@oraprod (RECO)     
 162 15787        oracle                         oracle@oraprod (CJQ0)      
 161 15789        oracle                         oracle@oraprod (MMON)     
 160 15791        oracle                         oracle@oraprod (MMNL)    
 138 16425        oracle                         oracle@oraprod (J000)     
 154 15799        oracle                         oracle@oraprod (QMNC)    
 147 16259        oracle                         sqlplus@oraprod (TNS V1-V3)      
 150 15807        oracle                         oracle@oraprod (q000)      
 151 15809        oracle                         oracle@oraprod (q001)      
 137 16329        oracle                         sqlplus@oraprod (TNS V1-V3)    
 134 16380        BSR                           sqlplus.exe
 18 rows selected. 
ps -ef|grep 16380  
 oracle 16436 16236  0 17:30:37 pts/0     0:00 grep 16380 
oracle 16380     1  0 17:28:32 ?         0:00 oracleORCL  (LOCAL=NO) 
kill -9 16380$
 ====================================================================== 
Using Windows ORAKILL Command: 
======================================== 
SQL> select sid,username,program from v$session;     
   SID USERNAME                       PROGRAM
———- ——————————
  1                                ORACLE.EXE     
 2                                ORACLE.EXE   
 3                                ORACLE.EXE   
 4                                ORACLE.EXE     
 5                                ORACLE.EXE       
6                                ORACLE.EXE   
7                                ORACLE.EXE      
8                                ORACLE.EXE       
9    SCOTT              sqlplus.exe       
10    SYS                    sqlplus.exe  
Find out the SPID related to SCOTT Session (Session ID 9): 
SQL> select  s.sid,p.spid,s.osuser,s.program from v$session s, v$process p   where p.addr=s.paddr;       
 SID SPID         OSUSER                         PROGRAM
———- ———— ——————————
  1 1744         SYSTEM                         ORACLE.EXE       
  2 3716         SYSTEM                         ORACLE.EXE      
   3 3644         SYSTEM                         ORACLE.EXE      
   4 3888         SYSTEM                         ORACLE.EXE       
  5 1992         SYSTEM                         ORACLE.EXE      
   6 2092         SYSTEM                         ORACLE.EXE        
   7 2148         SYSTEM                         ORACLE.EXE      
   8 2224         SYSTEM                         ORACLE.EXE      
   9 2720         BSR                                sqlplus.exe       
10 2144        BSR                                sqlplus.exe 
10 rows selected. 
 Kill the  SPID 2720 related to SCOTT SID 9 session using orakill command 
C:\>orakill TEST 2720
 Kill of thread id 2720 in instance TEST successfully signaled.
 C:\> SQL>  select sid,username,program from v$session       
 SID USERNAME                       PROGRAM
———- ——————————
    1                                ORACLE.EXE      
   2                                ORACLE.EXE       
  3                                ORACLE.EXE      
   4                                ORACLE.EXE       
  5                                ORACLE.EXE      
   6                                ORACLE.EXE      
   7                                ORACLE.EXE       
  8                                ORACLE.EXE       
10 SYS                        sqlplus.exe 
9 rows selected 
Note:  Do not kill the sessions at the OS level as per Oracle Recommendation. As per understanding i have given the above approaches
 Reference: 
(1) Should Sessions be Killed in OS or Using Alter System Kill Session? Doc ID: Note:161794.1
               

Database Listener Hangs Intermittently

Hi Friends,

 I would like to share one thing here, about listener. Problem was Database Listener hangs intermittently in 10.2.0.1 instance. It might happen in the 10g versions(10.1.0.3, 10.1.0.4, 10.1.0.4.2, 10.1.0.5, 10.2.0.1 and 10.2.0.2). This is mostly specific to Unix/Linux platforms.
The issue was
 —> Users were not able to connect to database
 —> There was high CPU Usage for Listener process
 —> Found that, there were 2 listener processes running
We can check the listener processes by issuing the following command.
$ ps -ef | grep lsnr 
oracle  3184  1  0  May 14  ?  12:28 /oradata/oracle/bin/tnslsnr LISTENER -inherit
oracle  3988 3184  0  Jun 01  ? 10:15 /oradata/oracle/bin/tnslsnr LISTENER -inherit
Second line is the child listener process (Child Listener parent process id is the process id of parentlistener).
Just kill the child process to allow new connection until problem reoccurs.

The TNS listener can hang under load if a second spawned listener process is not closed (remains persistent).

TNS listener can hang at any time and it can happen in Standalone or RAC systems.
Here is the workaround to rectify this problem
As a Tempoary workaround, the following parameter can be added to listener.ora
(a)  SUBSCRIBE_FOR_NODE_DOWN_EVENT_=OFF
Where is listener name configured in listener.ora file
If the listener name is LISTENER, the parameter will be:
SUBSCRIBE_FOR_NODE_DOWN_EVENT_LISTENER=OFF
(b)  Locate the ons.config file in the 10g RDBS Oracle Home and rename it
      $cd $ORACLE_HOME/opmn/conf
      $mv ons.config ons.config_back
(c) Bounce the Listener after the above changes
This will prevent the listener from registering against ONS (Oracle Notification Services), which is the area affected by bug:4518443, as well as disable ONS itself.
Permanent Solution:
 Apply Patch 4518443  (Patch is available, can be downloaded from Metalink)
or
This issue is fixed in 10.2.0.3 and 10.2.0.4 Patchsets

1. Why and in what scenarios child listener process is spawned ?
                    Appearing listener child process is usual, but mostly it is happening so quick, that we will not be able to see it. When the Listener got connection request from client, it will fork itself and then that new process executes the Oracle binary to create the Server process. Sometimes, if there is a bad event (or might be at high load) or OS bug, this process get stuck and leads listener to be in hung state.
for more basic information regarding Fork unix processes
please check the following link
http://en.wikipedia.org/wiki/Fork_(operating_system)
2. What is meaning of setting SUBSCRIBE_FOR_NODE_DOWN_EVENT_=OFF
              It disables the Registering the Database Listener with ONS services.
Unfortunately there is no Oracle document which describes, what internally happens when you set or unset this undocumented parameter
3. Is there any use as of 10g for database to register with ONS ?
            Since I am not much more experienced with Advanced features in Oracle RAC Administration, as per my understanding, Listener registration with ONS is useful in RAC systems only, where you will configure advanced features like Fast Application Nofitification, Fast Application Fail over and Fast Connection Failover which inturn uses Oracle Notification Services (ONS).
Fast Connection Failover (FCF) is a integration between Oracle Application Server middle tiers and RAC databases. Fast Connection Failover uses Oracle Notification Services(ONS) for communication between the two tiers.
For more regarding Fast Conncetion Failover and Oracle Notification Services.
Please check the following clusterware and RAC Administration guide
Reference:
Note:340091.1 Intermittent TNS Listener Hang, New Child Listener Process Forked
Note:52808.1  Listener Hanging Intermittently
Note:284602.1 10g Listener: High CPU Utilization – Listener May Hang

How to detect block corruption using DBVERIFY utility

The primary tool for checking for corruption in an Oracle database is DBVERIFY. It can be used to perform a physical data structure integrity check on data files whether the database is online or offline. The big benefit of this is that DBVERIFY can be used to check backup data files without adding load to the database server. You invoke DBVERIFY from the operating system command line like this:
$ dbv file=data01.dbf logfile=verify.log blocksize=8192 feedback=100
In this example data01.dbf is the data file to check, and the tablespace this file belongs to has a block size of 8192 bytes. The feedback parameter tells DBVERIFY to draw a period on the screen after every 100 pages (blocks) of the file are verified.
In the log file you’ll see output like this:
DBVERIFY – Verification starting : FILE = data01.dbf
DBVERIFY – Verification complete
Total Pages Examined : 640
Total Pages Processed (Data) : 631
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 0
Total Pages Failing (Index): 0
Total Pages Processed (Other): 9
Total Pages Empty : 0
Total Pages Marked Corrupt : 0
Total Pages Influx : 0
The Total Pages Failing values show the number of blocks that failed either the data block or index block checking routine. The Total Pages Marked Corrupt figure shows the number of blocks for which the cache header is invalid, thereby making it impossible for DBVERIFY to identify the block type. And the Total Pages Influx is the number of blocks for which DBVERIFY could not get a consistent image. (This could happen if the database is open when DBVERIFY is run. DBVERIFY reads blocks multiple times to try to get a consistent image, but DBVERIFY cannot get a consistent image of pages that are in flux.)
If you want to verify only a portion of a data file, you can specify a starting and ending block when running DBVERIFY. If you want to verify the entire database, you can generate a short shell script to run DBVERIFY on every data file in the database. You can do this easily using SQL*Plus:
SQL> SPOOL dbv_on_all_files.sh
SQL> SELECT ‘dbv file=’ || file_name ||
2 ‘ logfile=file’ || ROWNUM ||
3 ‘.log blocksize=8192′
4 FROM dba_data_files;
SQL> SPOOL OFF
After running the shell script you can quickly scan all of the DBVERIFY log files with Unix commands like:
$ grep Failing file*.log
$ grep Corrupt file*.log
$ grep Influx file*.log
You can also use DBVERIFY to validate a single data or index segment. To do this you must be logged onto the database with SYSDBA privileges. During the verification the segment is locked; if the segment is an index then the parent table is also locked.
There are other ways to check for database corruption besides DBVERIFY. You can take a full database export, with the dump file optionally specified as a null device. This will read every row in every user table in the database, discovering any corrupted data blocks along the way. However, this technique does not access every index entry or the entire data dictionary.
If you want to check one table and all of its indexes, you can use the ANALYZE statement to read every row of the table, read every entry in each of the table’s indexes, and make sure the table and index data are consistent with each other:
SQL> ANALYZE TABLE emp VALIDATE STRUCTURE CASCADE;
This will lock the table, preventing DML on the table, unless you specify the ONLINE keyword. Online validation reduces the amount of validation performed to allow for concurrency.
There are several ways to check for corruption in an Oracle database, but the DBVERIFY tool is the most versatile. DBVERIFY does not limit concurrency or DML while it is running, and it can be run against a database backup. Just remember that if DBVERIFY detects corruption in your database and you are planning to recover the corrupt file from a backup, you should perform a DBVERIFY validation on the backup file before beginning the recovery. This will tell you if the corruption exists in the backup also.

Block Corruption and how to identify corrupted blocks

Block corruption is while the data is being written to the data blocks, if the write to the block fails abruptly, which means there is a partial write in the block, may be because of power disruption or I/O problem, leaving no time for header to be updated, or row data to be populated, oracle leaves the block corrupt.In case of block corruption you can normally use the database unless you try to read that particular block, against which it shoots up the block corruption error.Generally block corruption occurs if write fails on the block, when the transaction is being committed. You can find detail information about block corruption in alert.log file
block corruption can happens at

1)Physical Level corruption (which means media corrupt)
2)Logical Level corruption (which means soft corrupt)

Physical corruption can be caused by defected memory boards, controllers or broken sectors on a hard disk;

Logical corruption can among other reasons be caused by an attempt to recover through a NOLOGGING action.

To Differentiate between both the corruption levels (logical and physical)

Logical corruption is header – footer – that is one of the checks, yes (it is looking for fractured blocks and when it hits one, it’ll re-read it, that is why there is no need for “alter tablespace begin backup” with rman)
Physical corruption is “we cannot read the block from disk, something is physically preventing us from doing so”

The methods to detect Block corruptions are

1) DBVerify utility
2) Block checking parameters (DB_BLOCK_CHCEKSUM) – In Oracle 10g db_block_checksum value TYPICAL which should be TRUE and db_block_checking value FULL should be TRUE.
3) ANALYZE TABLE tablename VALIDATE STRUCTURE CASCADE SQL statement
4) RMAN BACKUP command with THE VALIDATE option

OraInventory Basics

oraInventory in Oracle


You all might be familiar with oraInventory & its importance so lets take a look at few common things about it like binary/xml inventory, What to do if Inventory is corrupted , What is global / Local Inventory and where to find documentation related to oraInventory .

What is oraInventory ?
oraInventory is repository (directory) which store/records oracle software products & their oracle_homes location on a machine. This Inventory now a days in XML format and called as XML Inventory where as in past it used to be in binary format & called as binary Inventory.
There are basically two kind of Inventory Global Inventory (also called as Central Inventory) and Local Inventory also called as Oracle Home Inventory.

Global Inventory ?
Global Inventory holds information about Oracle Products on a Machine. These products can be various oracle components like database,oracle application server, collaboration suite, soa suite, forms & reports or discoverer server . This global Inventory location will be determined by file oraInst.loc in /etc (on Linux) or /var/opt/oracle (solaris). If you want to see list of oracle products on machine check for fileinventory.xml under ContentsXML in oraInventory (Please note if you have multiple global Inventory on machine check all oraInventory directories)

You will see entry like
HOME NAME="ORA10g_HOME" LOC="/u01/oracle/10.2.0/db" TYPE="O" IDX="1"/
...
...

Local Inventory
Inventory inside each Oracle Home is called as local Inventory or oracle_home Inventory. This Inventory holds information to that oracle_home only.

Can I have multiple Global Inventory on a machine ?
Quite common questions is that can you have multiple global Inventory and answer is YES you can have multiple global Inventory but if your upgrading or applying patch then change Inventory Pointer oraInst.loc to respective location. If you are following single global Inventory and if you wish to uninstall any software then remove it from Global Inventory as well.

What to do if my Global Inventory is corrupted ?
No need to worry if your global Inventory is corrupted, you can recreate global Inventory on machine using Universal Installer and attach already Installed oracle home by option
-attachHome

./runInstaller -silent -attachHome -invPtrLoc $location_to_oraInst.loc
ORACLE_HOME="Oracle_Home_Location" ORACLE_HOME_NAME="Oracle_Home_Name"
CLUSTER_NODES="{}"

Do I need to worry about oraInventory during oracle Apps 11i cloning ?
No, Rapid Clone will update both Global & Local Inventory with required information , you don't have to worry about Inventory during Oracle Apps 11i cloning.