Thursday, July 25, 2013

ASM... Some Mysterious and important information

ASMCMD is very slow. How can I speed it up?

The asmcmd utility appears to be very slow. This slowness is a result of queries against the v$asm_diskgroup view. To solve this problem edit the $ORACLE_HOME/bin/asmcmdcorescript and change all v$asm_diskgroup references to v$asm_diskgroup_stat.
V$asm_diskgroup and v$asm_diskgroup_stat provides exactly the same information, but the %_stat view operates from cache, while v$asm_diskgroup rescans all disk headers. This method is also used by Oracle in their Enterprise Manager product.

[edit]ASM disk header/superblock backups?

ASM disk headers (superblocks) cannot be backed up and restored in Oracle 10g. By implication, if you use EXTERNAL REDUNDANCY and a single disk's header is accidentally overwritten, the entire disk group will have to be restored. To solve this problem, Oracle introduced the md_backup and md_restore asmcmd commands in Oracle 11g. In Oracle 10g, the only viable method to prevent logical corruption of ASM header block is to add failgroup, storage vendor has no responsibility to verify/checksum ASM disk header blocks (EXTERNAL REDUNDANCY is not going to help). There is a kfed utility to backup ASM disk headers and restore them for LIMITED scenario. It is best to be executed under guidance of a few elite support engineers. Oracle did not advertise the utility due to the potential damage it could cause. For those unrecoverable (tedious manual fixes) cases, restoring disk group is the last resort.


In an Oracle RAC database there are multiple nodes all going to the same ASM instance. If you use normal mirroring in an ASM diskgroup, the behavior of the access to the disks may not be what you assume.
Suppose you have a diskgroup called DG3 with two failgroups (DG2_0000 and DG2_0001), each with a separate disk, as shown in the figure below:
When something is written to the diskgroup DG2, the extents are written in a round robin manner: the first one goes to DG2_0000 with a copy going to DG2_0001, the second extent to DG2_0001 with a copy on DG2_0000, the third one back to DG2_0000 with copy on DG2_0001, and so on. In this manner ASM maintains the copy of one disk on the other.
But when the extents are read, they are always read from the primary failgroup (DG2_0000, in this case); not from the secondary (DG2_0001). The secondary is read only when the primary is not available.
This works well in most cases but sometimes may not be desirable. In Oracle Database 11g, you can configure a node to read from a specific failgroup. For instance, in the above example, if you want to configure the instance 1 to read from failgroup DG2_0000 and instance 2 to read DG2_0001, you can set the preferred read group for those diskgroups. The following command, executed in instance 1, causes the failgroups DG2_0000 and DG3_0000 in diskgroup DG2 and DG3 respectively to be preferred on instance 1:
SQL> alter system set asm_preferred_read_failure_groups = 'DG2.DG2_0000','DG3.DG3_0000'
Similarly, on the other instance, you can issue the following to make the other failgroups the preferred disks:
SQL> alter system set asm_preferred_read_failure_groups = 'DG2.DG2_0001','DG3.DG3_0001'        
After the statements are executed, when some session from instance 1 wants to read from the diskgroup DG2, the disk DG2_0000 will be read. If the disk is not available, the other disk, DG2_0001, is read. Similarly when a session connected to the instance 2 reads the data, the disk DG2_0001 is read.
If you want to check how the different disks of the diskgroups are utilized, you can refer to a new dictionary view, V$ASM_DISK_IOSTAT, which simulates the IOSTAT utility found in UNIX systems:
select
instname,
dbname,
group_number,
failgroup,
disk_number,
reads,
writes
from v$asm_disk_iostat
order by 1,2,3,4,5,6
/
Here is sample output:
INSTNAM DBNAME   GROUP_NUMBER FAILGROUP  DISK_NUMBER      READS     WRITES
------- -------- ------------ ---------- ----------- ---------- ----------
PRONE31 PRONE3 2 DG2_0000 0 4450 910
PRONE32 PRONE3 2 DG2_0001 1 2256 910
PRONE31 PRONE3 3 DG3_0000 0 300 29
PRONE32 PRONE3 3 DG3_0001 1 560 29
This output shows instances PRONE31 and PRONE32 have their preferred failgroups DG2_0000 and DG2_0001 respectively. Note the WRITES column; they are identical at 910. This is because the writes occur uniformly to both disks. Now note the READS column. It's 4450 and 2256 for the instances PRONE31 and PRONE32 respectively. Why? Because the instance PRONE3 issued more reads and the reads came from its preferred failgroup, DG2_0000. As for diskgroup DG3, the instance PRONE32 issued more reads which came from its preferred failgroup (DG3_0001) and thus that disk shows more reads.
Preferred reads are especially helpful in "stretch" clusters (clusters with a large geographic distance between nodes). The preferred reads make it faster for reads by isolating reads to specific disks.

What happens when a disk is no longer present (or destroyed beyond repair)? You want to drop the diskgroup completely and recreate it or add the disks of the diskgroup to other diskgroups. The diskgroup is not mounted yet. Since one of the disks is missing, you can't even mount it. To drop the diskgroup you have to mount it but you can't mount it because the disk is missing—a perfect "catch-22" situation. What should you do?
In Oracle Database 10g you can use a workaround by erasing the header of the disk using the dd command:
$ dd if=/dev/zero of=/dev/raw/raw13 bs=1024 count=4
This puts zeroes in the header of the disk /dev/raw/raw13, erasing all information. While this is effective it erases the information in the disk header completely, along with the fact that the disk used to be a part of the diskgroup.
In Oracle Database 11g, you don't need to resort to this workaround. All you have to do is issue the drop command with a force option:
SQL> drop diskgroup dg7 force including contents;
This drops the diskgroup even if the disks are not mounted. The available disks show up as FORMER; that is, they used to be part of some diskgroup. (Note: You have to use the "including contents" clause.)

Many people think of ASM as a database with its own storage. This is not at the case at all—ASM does not store data; the database does. The ASM instance, however, maintains metadata such as the diskgroup names, the disks in them, the directories, and so on. This metadata is stored in the disk headers.
Suppose all the disks crash and the header information disappears. What do you do? Of course, you have taken the backup of the database using RMAN and you can restore it. But you can restore it only after you have created all the diskgroups and directories. Hopefully you kept a record of all that. (Right?) Even if you did, this process takes time.
What if you had a backup? In Oracle Database 11g, you can backup the metadata of the ASM instance through the ASM command line option (ASMCMD), using the command md_backup.
$ asmcmd -p

ASMCMD [+] > md_backup
It creates a file named ambr_backup_intermediate_file. Here is a portion of that file from the top:
@diskgroup_set = (
{
'DISKSINFO' => {
'DG1_0000' => {
'DG1_0000' => {
'TOTAL_MB' => '103',
'FAILGROUP' => 'DG1_0000',
'NAME' => 'DG1_0000',
'DGNAME' => 'DG1',
'PATH' => '/dev/raw/raw5'
}
}
},
'DGINFO' => {
'DGTORESTORE' => 0,
'DGCOMPAT' => '10.1.0.0.0',
'DGNAME' => 'DG1',
'DGDBCOMPAT' => '10.1.0.0.0',
'DGTYPE' => 'EXTERN',
'DGAUSZ' => '1048576'
},
'ALIASINFO' => {},
'TEMPLATEINFO' => {
'6' => {
'DGNAME' => 'DG1',
'STRIPE' => 'COARSE',
'TEMPNAME' => 'ASM_STALE',
'REDUNDANCY' => 'UNPROT',
'SYSTEM' => 'Y'
... and more ...
I have not shown the entire file here to save space. It records all the diskgroups, disks, directories, the disk attributes and so on. By default, this file records all the diskgroups. If you want to backup only a specific diskgroup, you can use the -g option. In addition, you can use -b option to create a specific named file.
ASMCMD [+] > md_backup -g dg1 -b prolin3_asm.backup
This backs up the metadata of the diskgroup DG1 to a file named prolin3_asm.backup instead of the default ambr_backup_intermediate_file. This file must be new so you should remove it, if it exists, prior to the generation.
Now let's see how the restore works. There are several types of restore. The easiest use is to restore a diskgroup along with the directories that was dropped earlier. First create a directory on the diskgroup:
ASMCMD [+] > cd DG7
ASMCMD [+DG7] > mkdir TEST
ASMCMD [+DG7] > ls
TEST/
The diskgroup has a directory called TEST. Now, back up the diskgroup:
ASMCMD [+] > md_backup -g dg7 -b g7.backup
Once backed up, suppose you drop the diskgroup to simulate an accidental drop:
SQL> drop diskgroup dg7;     

Diskgroup dropped.
Now the diskgroup DG7 is gone from the ASM instance and you want to restore it from the earlier backup. You can reinstate it using the md_restore command:
$ asmcmd md_restore -b dg7.backup -t full  

Current Diskgroup being restored: DG7
Diskgroup DG7 created!
System template TEMPFILE modified!
System template FLASHBACK modified!
System template ARCHIVELOG modified!
System template BACKUPSET modified!
System template XTRANSPORT modified!
System template DATAGUARDCONFIG modified!
System template CONTROLFILE modified!
System template AUTOBACKUP modified!
System template DUMPSET modified!
System template ONLINELOG modified!
System template PARAMETERFILE modified!
System template ASM_STALE modified!
System template CHANGETRACKING modified!
System template DATAFILE modified!
Directory +DG7/TEST re-created!
Look at the output; it creates the diskgroup as well as the templates and the directory. If there were any data, of course it will be lost. The md_backup is not a backup of the data but rather the metadata of the ASM instance. The data, ostensibly, is being backed up by RMAN. After the diskgroup is created, along with all the directories, you can restore the RMAN backup to this diskgroup.
Another option, -f, allows you to place the commands in a script file, rather than execute them:
ASMCMD [+] > md_restore -b dg7.backup -t full -f cr_dg7.sql
It creates an SQL script with the name cr_dg7.sql that crates the diskgroup and all other objects. You can manually run this in the ASM instance. Here is how the file looks:
create diskgroup DG7 EXTERNAL redundancy  disk '/dev/raw/raw14' name DG7_0000 size 100M ;
alter diskgroup /*ASMCMD AMBR*/DG7 alter template TEMPFILE attributes (UNPROTECTED COARSE);
alter diskgroup /*ASMCMD AMBR*/DG7 alter template FLASHBACK attributes (UNPROTECTED FINE);
alter diskgroup /*ASMCMD AMBR*/DG7 alter template ARCHIVELOG attributes (UNPROTECTED COARSE);
alter diskgroup /*ASMCMD AMBR*/DG7 alter template BACKUPSET attributes (UNPROTECTED COARSE);
alter diskgroup /*ASMCMD AMBR*/DG7 alter template XTRANSPORT attributes (UNPROTECTED COARSE);
alter diskgroup /*ASMCMD AMBR*/DG7 alter template DATAGUARDCONFIG attributes (UNPROTECTED COARSE);
alter diskgroup /*ASMCMD AMBR*/DG7 alter template CONTROLFILE attributes (UNPROTECTED FINE);
alter diskgroup /*ASMCMD AMBR*/DG7 alter template AUTOBACKUP attributes (UNPROTECTED COARSE);
alter diskgroup /*ASMCMD AMBR*/DG7 alter template DUMPSET attributes (UNPROTECTED COARSE);
alter diskgroup /*ASMCMD AMBR*/DG7 alter template ONLINELOG attributes (UNPROTECTED FINE);
alter diskgroup /*ASMCMD AMBR*/DG7 alter template PARAMETERFILE attributes (UNPROTECTED COARSE);
alter diskgroup /*ASMCMD AMBR*/DG7 alter template ASM_STALE attributes (UNPROTECTED COARSE);
alter diskgroup /*ASMCMD AMBR*/DG7 alter template CHANGETRACKING attributes (UNPROTECTED COARSE);
alter diskgroup /*ASMCMD AMBR*/DG7 alter template DATAFILE attributes (UNPROTECTED COARSE);
alter diskgroup /*ASMCMD AMBR */ DG7 add directory '+DG7/TEST';
One of the most useful applications of this feature is to document the metadata of the ASM instance. You can take backups at regular intervals, or after some major change, such as after adding a diskgroup, adding/removing disks or creating directories.

One of the biggest complaints from the ASM users accustomed to a traditional volume manager is the ability to check many things through command line. The ASM Command Line option (ASMCMD) closes this gap to a large extent. In Oracle Database 11g, some additional commands in the ASMCMD prompt makes it extremely easy to manage the ASM instance. One such example is the metadata backup you saw earlier. The other notable is the command to check disks managed by the instance. The command is lsdsk.
ASMCMD> lsdsk
Path
/dev/raw/raw10
/dev/raw/raw11
/dev/raw/raw13
... snipped ...
Without any flag, the command merely lists all the disks available to the instance. There are several flags that modify the output. The most common is -k as shown below:
ASMCMD> lsdsk -k
Total_MB Free_MB OS_MB Name Failgroup Library Label UDID Product Redund Path
103 41 103 DG4_0000 DG4_0000 System UNKNOWN /dev/raw/raw10
103 41 103 DG5_0000 DG5_0000 System UNKNOWN /dev/raw/raw11
... snipped ...
Another flag, -s, shows various I/O related statistics of the disks:
ASMCMD> lsdsk -s
Reads Write Read_Errs Write_Errs Read_time Write_Time Bytes_Read Bytes_Written Path
207485 207916 0 0 245.820323 159.634398 851251200 /dev/raw/raw10
207481 207912 0 0 229.996931 144.73954 851234816 /dev/raw/raw11
To quickly check the status of the disks, use the -p flag:
ASMCMD> lsdsk -p
Group_Num Disk_Num Incarn Mount_Stat Header_Stat Mode_Stat State Path
4 0 3915926174 CACHED MEMBER ONLINE NORMAL /dev/raw/raw10
5 0 3915926175 CACHED MEMBER ONLINE NORMAL /dev/raw/raw11
6 0 3915926193 CACHED MEMBER ONLINE NORMAL /dev/raw/raw13
Finally, the -t flag shows repair related information (described later in this installment):
ASMCMD> lsdsk -t
Create_Date Mount_Date Repair_Timer Path
27-SEP-07 28-SEP-07 0 /dev/raw/raw10
27-SEP-07 28-SEP-07 0 /dev/raw/raw11
28-SEP-07 28-SEP-07 0 /dev/raw/raw13
So far, the ASMCMD option extracted the values from the various V$ views in the ASM instance. But the metadata is stored on the disk itself. If the instance is not available, there should be a way to extract that information from the disks. In Oracle Database 11g, the lsdsk command has a flag "I" (uppercase "I", not lowercase "L") that extracts the information from the disk headers instead of V$ views. Here is an example of the -k flag extracting the information from the disk headers.
ASMCMD> lsdsk -Ik
Total_MB Name Failgroup Path
103 DG4_0000 DG4_0000 /dev/raw/raw10
103 DG5_0000 DG5_0000 /dev/raw/raw11
102 DG6_0000 DG6_0000 /dev/raw/raw13
To list disks of a specific diskgroup, say, DG1, you can use the -d flag as shown below:
ASMCMD> lsdsk -t -d dg1
Create_Date Mount_Date Repair_Timer Path
28-SEP-07 28-SEP-07 0 /dev/raw/raw5

You can also give a pattern for the disks:
ASMCMD> lsdsk -t /dev/raw/raw1*
Create_Date Mount_Date Repair_Timer Path
27-SEP-07 28-SEP-07 0 /dev/raw/raw10
27-SEP-07 28-SEP-07 0 /dev/raw/raw11
28-SEP-07 28-SEP-07 0 /dev/raw/raw13
28-SEP-07 05-OCT-07 0 /dev/raw/raw14
This shows only the disks that match the pattern. Finally, no need to remember these options; the help command shows all the options:
ASMCMD> help lsdsk
lsdsk [-ksptcgHI] [-d <diskgroup_name>] [pattern]
The lsdsk command brings the more advanced volume management like functionality to the ASM world.
Suppose you added a disk to a diskgroup. ASM immediately starts the rebalance operation. This operation is online so ASM must coordinate with the RDBMS instance the blocks accessed and changed, through a complex system of locks. In a RAC database, this process is exacerbated since the locks must be managed not just within the database but across multiple instances now.
What if you are adding the disks to a diskgroup no one is using? If ASM could somehow know that, it could eliminate the locking mechanism and make the process faster.
In Oracle Database 11g, a new way of mounting diskgroup makes it possible. The diskgroup can be mounted with a RESTRICT clause as shown below:
alter diskgroup dg7 mount restricted;
When the diskgroup is mounted this way, the ASM instance is aware of the exclusivity of the operations on the underlying disks and minimizes the locking mechanism. This, in turn, affects the disk operations such as rebalance faster.
Consider a diskgroup DG2 with two failgroups each with a single disk. When a certain area of one of the disks is damaged, it's not fatal for the diskgroup. Since they are mirrored, the damaged extents are reads from the other, intact disk and the operation goes through. But what happens to the damaged portion of the disk?
In Oracle Database 10g, the damaged disk is made offline and either the same disk or another one must be presented to the diskgroup. When the new disk is added, it must be completely cloned from the surviving disk to be used as a mirror. But if only a few blocks are damaged, copying the contents of a 34GB disk (for instance) is not efficient.
Therefore, in Oracle Database 11g, the damaged portions of the disks are repaired instead of copying the whole disk. This feature uses a new attribute of diskgroups, disk_repair_time, which specifies how long the ASM instance should tolerate a disk with errors before dropping it from the diskgroup. Here is how you set the attribute of the diskgroup DG2 to two hours:
SQL> alter diskgroup dg2 set attribute 'disk_repair_time'='2H';
Suppose DG2 has two disks, DISK1 and DISK2, and a few blocks on DISK2 suddenly go bad. Since the repair time is two hours, the ASM instance will not drop the disk immediately but rather will wait. If you fix the issue on DISK2 and make it online once again, then those bad blocks will be repaired from the surviving disk.
Let's see that with an example. Suppose the diskgroup DG2 has two failgroups. First, check the diskgroup configuration:
ASMCMD [+dg2] > lsdg dg2
State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Name
MOUNTED NORMAL N 512 4096 1048576 206 78 0 39 0 DG2/
You can also confirm that from the du command:
ASMCMD [+dg2] > du  
Used_MB Mirror_used_MB
11 22
The du command confirms that the diskgroup has 22MB but only 11MB is useful. Now check the disks for the group dg2:
ASMCMD [+dg2] > lsdsk -d dg2
Path
/dev/raw/raw6
/dev/raw/raw7
You can also confirm that as well as get the names of the disks by:
SQL> select name, path 
2 from v$asm_disk
3 where group_number = 2
4 /

NAME PATH
-------- -----------------------------
DG2_0000 /dev/raw/raw7
DG2_0001 /dev/raw/raw6
This shows that the diskgroup has two disks and shows their names. Now to simulate a block corruption, put some characters in the middle of the device:
$ dd if=/dev/zero of=/dev/raw/raw7 bs=1024 skip=10 count=1
This makes one of the disks of the diskgroup corrupt. Now force a check of the diskgroup using the new command ALTER DISKGROUP ... CHECK.
SQL> alter diskgroup dg2 check
If you check the alert log of the ASM instance, you will notice, among several other lines, the following:
...
NOTE: starting check of diskgroup DG2
WARNING: cache read a corrupted block gn=2 fn=3 indblk=1 from disk 0
...
NOTE: cache successfully reads gn 2 fn 3 indblk 1 count 15 from one mirror side
kfdp_checkDsk(): 89
...
NOTE: cache initiating offline of disk 0 group 2
WARNING: initiating offline of disk 0.3915926170 (DG2_0000) with mask 0x7e
...
WARNING: Disk (DG2_0000) will be dropped in: (7200) secs on ASM inst: (1)
...
The last line says it all. The disk you just corrupted will be dropped from the diskgroup in 7200 secs, which corresponds to two hours you set earlier as the repair timer. The message will be repeated in the alert log as the time drops:
...
WARNING: Disk (DG2_0000) will be dropped in: (5550) secs on ASM inst: (1)
GMON SlaveB: Deferred DG Ops completed.
Sat Oct 06 00:25:52 2007
WARNING: Disk (DG2_0000) will be dropped in: (5366) secs on ASM inst: (1)
GMON SlaveB: Deferred DG Ops completed.
Sat Oct 06 00:28:55 2007
WARNING: Disk (DG2_0000) will be dropped in: (5183) secs on ASM inst: (1)
GMON SlaveB: Deferred DG Ops completed.
Sat Oct 06 00:31:59 2007
WARNING: Disk (DG2_0000) will be dropped in: (5000) secs on ASM inst: (1)
GMON SlaveB: Deferred DG Ops completed.
...
Finally, the countdown will reach 0 and the disk will be dropped, unless you fix the issue and the disk goes through fast failure repair. If you know the disk is not repairable and should be dropped sooner, you can expedite its demise by issuing:
SQL> alter diskgroup dg2 offline disks in failgroup dg2_0000 drop after 1m
This drops the failgroup dg2_0000 after 1 minute allowing you to either take it physically out or move a different disk to the diskgroup. To drop the disk forcibly:
SQL> alter diskgroup dg2 drop disks in failgroup dg2_0001 force;
When the disk failure is resolved, you can initiate the fast repair by issuing:
SQL> alter diskgroup dg2 online disks in failgroup dg2_0001;
This will start the process of syncing the damaged and changed blocks on the disks of the failgroup DG2_0001 from the other surviving diskgroups. As the entire disk is not replicated but a few blocks alone, this dramatically reduces the time it takes to synchronize the disks after a small failure.

No comments: