Tuesday, June 3, 2014

Oracle Database Switch over and failover steps

Oracle Database Switch over and failover steps

Switchover and Failover for Oracle RAC Database Using Physical Standby

  1. Switchover
1)      This is for:
a.       Scheduled and unscheduled maintenance at primary site (e.g., OS patching, database patching).
b.      A primary database that is running normally --- the switchover operation must start from the primary site.
c.       Application DR Tests --- switchover can be used for a DR test that tests non-database parts (Failover is the right procedure for database DR test).
2)      Steps
a.       Shutdown all primary instances except one.
b.      Shutdown all standby instances except one.
c.       On the primary site
                                                                                 i.            SQL> select switchover_status from V$database;
                                                                                ii.            If switchover_status shows ‘sessions active’ go to step iii. If switchover_status shows ‘TO_STANDBY’ go to step v.
                                                                              iii.            SQL> select sid, process, program from v$session where type='user' and sid <>(select distinct sid from V$mystat);
                                                                              iv.            If there are rows returned, contact customers and ask if the proceses can be killed. If yes go to step v. Otherwise stop here until they log out.
                                                                               v.            SQL> alter database commit to switchover to physical standby with session shutdown;
                                                                              vi.            SQL>shutdown immediate;
                                                                            vii.            SQL>exit;
                                                                           viii.            Restart the instance and mount it as standby
         SQL> startup nomount;
         SQL> alter database mount standby database;
                                                                               ix.            Start the recovery
         SQL> alter database recover managed standby database disconnect;
d.      On the DR site.
                                                                                 i.            SQL>alter database recover managed standby database cancel;
                                                                                ii.            SQL>alter database recover managed standby database nodelay disconnect from session through last switchover; --- this is needed when a time lag used.
                                                                              iii.            SQL> alter database commit to switchover to primary with session shutdown;
                                                                              iv.            SQL> alter database open;
e.       On the original primary site mount the other instances as standby.
f.        On the original DR site start up and open the other instances (as primary).
g.       Verify the new data guard configuration
                                                                                 i.            on new primary
SQL> ALTER SYSTEM ARCHIVE LOG CURRENT;
                                                                               ii.            on new standby --- make sure the log file is copied over and applied or will be applied (if time lag in apply is used).


  1. Failover
1)      This is for:
a.       A situation when primary database is completely gone and can’t be recovered within a time window as defined by business units.
b.      A real DR test --- failover operation is performed completely on DR site. There is no dependency on DR site.
c.       Repairing or trouble shooting a database due to human mistakes or data corruption --- time lag has to be used in this case.  
d.      Documentum is currently using 4 hr time lag.
2)      Steps
a.       Identify and resolve any gaps in the archived redo logfiles.
                                                                                 i.            SQL>  select thread#, low_sequence#, high_ sequence# from v$archive_gap
                                                                                ii.            SQL>  alter database register physical logfile ‘2_820_657623969.log’
b.      Identify, copy and register any other missing archived redo logfiles.
                                                                                 i.            SQL>  select unique thread# as thread, max(sequence#) over (partition by thread#) as last from v$archived_log;
                                                                                ii.            SQL>  alter database register physical logfile ‘1_820_657623970.log’
c.       We should use log writer (lgwr) and standby redo for log transfer, which will help avoid steps a and b.
d.      On DR site shutdown all the standby instances except one.
e.       On DR site initiate a failover.
                                                                                 i.            SQL> alter database recover managed standby database finish force;
                                                                                ii.            SQL> alter database commit to switchover to primary;
                                                                              iii.            SQL> alter database open;
                                                                              iv.            SQL> shutdown immediate;
                                                                               v.            SQL> startup;
f.    Failover with a time lag and without left logs applied. This is only for repairing or trouble shooting a database due to human mistakes or data corruption.
                                                                                 i.            SQL> alter database activate physical standby database;
                                                                                ii.            SQL> shutdown immediate;
                                                                              iii.            SQL> startup;
g.       Startup and open other instances on DR site.
h.   Rebuild physical standby on the original primary site after it comes back

No comments: