Oracle RAC 12c: A Recipe to Recover From Losing OCR, Voting Disk, or ASM Password

Starting with Oracle 11gR2, it is difficult to lose your OCR file or voting disk as they are stored in an ASM disk group with normal or high redundancy.

The scenario below will corrupt the header of oracleasm disks that belong to disk group that has an OCR file, a voting disk, and ASM Password file.

Then we will show the steps and commands to recover from losing those files by

  • Restoring the OCR file from the latest automatic backup
  • Adding a new voting disk
  • Creating a new ASM password file and adding key users (SYS, ASMSNMP, CRSUSER__ASM_001) to password file

 

Unfortunately, there is NO other option to remove OCR file. We can’t manually delete OCR file from ASM, because the file is being accessed.

For example,

ASMCMD> rm -rf +DATA/rac-cluster/OCRFILE/REGISTRY.255.966255313
ORA-15032: not all alterations performed
ORA-15028: ASM file '+DATA/rac-cluster/OCRFILE/REGISTRY.255.966255313' not dropped; currently being accessed (DBD ERROR: OCIStmtExecute)

Also, when cluster or CRS is down, we are not able to get to the OCR file as it is not available.

For example,

[oracle@rac1 ~]$ asmcmd
Connected to an idle instance.
ASMCMD> ls
ASMCMD-8102: no connection to Oracle ASM; command requires Oracle ASM to run
ASMCMD>

 

Assumptions

  • OCR disk group name is +REDO.
  • Database datafiles, controlfile, archives are located on another ASM disk groups.
  • Only database redo log files are located on +REDO.
  • ASM parameter file and password file are located on +REDO.
  • Starting with 12cR2, OCR file automatic backups are located on ASM. In our example +DATA.

 

Scenario preparation
Step# Description
1 Take a backup for ASM SPFILE file.

— grid account
SQL> create pfile=’/backup/ASM/init+ASM.ora’ from spfile;

2 Take a backup for ASM metadata.

— grid account
ASMCMD> md_backup /backups/ASM/metadata.bkp -G data,redo,reco
ASMCMD> md_restore /backups/ASM/metadata.bkp -S /backups/ASM/metadata.sql

3 Take a backup for the last OCR automatic backup.

— grid account
$ ocrconfig -showbackup
ASMCMD> cd data/<DB_UNIQUE_NAME>/OCRBACKUP
ASMCMD> cp backup00.ocr.313.965472125 /backup/OCR/ocr.bkp

4 Run below commands to show OCR & voting disk info and save the output as a reference.

— grid account
$ ocrcheck
$ crsctl query css votedisk

5 Query oracleasm disk info for REDO disks and save the output as a reference to be used later when we recreate the disks again.

— root account
$ oracleasm querydisk -p REDO1
$ oracleasm querydisk -p REDO2
$ oracleasm querydisk -p REDO3

6 This is an optional as long as databases files are stored on another disk groups – Take a backup for your databases using RMAN and store backup on file system.
Corrupt REDO disks header and cleanup all oracle processes
Step# Description
1 Corrupt REDO disks header using DD command from one node only.

— root account
$ dd if=/dev/zero of=/dev/oracleasm/disks/REDO1 bs=8192
$ dd if=/dev/zero of=/dev/oracleasm/disks/REDO2 bs=8192
$ dd if=/dev/zero of=/dev/oracleasm/disks/REDO3 bs=8192

2 Most of CRS resources will crash and servers may reboot. We need to cleanup oracle processes on all cluster nodes.

— root account
$ crsctl stop crs -f
$ ps -ef | grep d.bin  🡺  kill -9 <PID>
$ ps -ef | grep oracle 🡺 kill -9 <PID>
$ ps -ef | grep grid 🡺 kill -9 <PID>

3 Cleanup oracleasm disks on all cluster nodes.

— root account
$ oracleasm scandisks
$ oracleasm listdisks

Recover from losing OCR disk group
Step# Description
1 Create oracleasm disks for REDO disk group. Run from one node only then scan disks on another nodes.

— root account
$ oracleasm createdisk REDO1 /dev/mapper/mpathm
$ oracleasm createdisk REDO2 /dev/mapper/mpathk
$ oracleasm createdisk REDO3 /dev/mapper/mpathn
$ oracleasm scandisks
$ oracleasm listdisks

— On remaining nodes as root account

$ oracleasm scandisks
$ oracleasm listdisks

2 Start ASM instance on one node only in exclusive mode.

— root account
$ crsctl start crs -excl -nocrs

3 Check CRSD is down and stop it if it is not (STATE=OFFLINE).

— root account
$ crsctl status resource ora.crsd -init

4 Query voting disks and drop them if they exist.

— root account
$ crsctl query css votedisk
$ crsctl delete css votedisk +REDO

5 Connect to ASM instance and drop REDO disk group if it still exists.

— grid account
SQL> select name from v$asm_diskgroup where name=’ REDO’;
SQL> drop diskgroup REDO force including contents;

6 Connect to ASM instance and set ASM parameter asm_diskstring.

— grid account
SQL> alter system set asm_diskstring=’/oracleasm/disks/*’;

7 Create REDO disk group using the metadata backup script we created earlier.

Copy SQL commands, which are related to REDO disk group from backup script, connect to ASM instance, then run the commands.

— grid account
$ cat /backups/ASM/metadata.sql | grep REDO > /backups/ASM/REDO_metadata.sql
SQL> @/backups/ASM/REDO_metadata.sql

8 Restore the OCR from backup done before.

— root account
$ ocrconfig -restore /backup/OCR/ocr.bkp
$ ocrcheck

9 Add voting disks.

— root account
$ crsctl replace votedisk +REDO
$ crsctl query css votedisk

10 Connect to ASM instance and create ASM SPFILE

— grid account
SQL> create spfile=’+REDO’ from pfile=’/backup/ASM/init+ASM.ora’;
ASMCMD> cd REDO/<DB_UNIQUE_NAME>/ASMPARAMETERFILE
ASMCMD> ls -lt
ASMCMD> spset +REDO/<DB_UNIQUE_NAME>/ASMPARAMETERFILE/registry.253.966422601
ASMCMD> spget

11 CRS is required to be running in non-exclusive mode to recreate the ASM password file in the disk group.  This means some ‘application’ cluster resources will be started and available now.

— root account
$ crsctl stop crs -f
$ crsctl start crs -wait

12 Create ASM password file on REDO disk group.

— grid account

12.1. Create password file

ASMCMD> pwget –asm
ASMCMD> pwdelete –asm
ASMCMD> pwcreate –asm +DATA/orapwASM ‘<password>’ -f
ASMCMD> pwget –asm

12.2. Grant sysasm to SYS

ASMCMD> orapwusr –grant sysasm SYS
ASMCMD> lspwusr

12.3. Add ASMSNMP user and grant sysasm & sysdba

ASMCMD> orapwusr –add ASMSNMP  è enter the same password used to create password file
ASMCMD> orapwusr –grant sysdba ASMSNMP
ASMCMD> orapwusr –grant sysasm ASMSNMP
ASMCMD> lspwusr

12.4. Add user CRSUSER__ASM_001 to ASM password file

12.4.1. Dump OCR file and search dump file for “CRSUSER__ASM_001” to get user “ORATEXT”

$ $GRID_HOME/bin/ocrdump /tmp/OCR.dmp
$ vi /tmp/OCR.dmp
For example,
[SYSTEM.ASM.CREDENTIALS.USERS.CRSUSER__ASM_001]
ORATEXT : a332f4dbe3da5f4bbf3a71895a466319:grid

12.4.2. Get user “CRSUSER__ASM_001” password

$ /prod01/home/dba/grid/12.2.0.1/bin/crsctl get credmaint -path /ASM/Self/a332f4dbe3da5f4bbf3a71895a466319 -credtype userpass -id 0 -attr user -local
CRSUSER__ASM_001
$ /prod01/home/dba/grid/12.2.0.1/bin/crsctl get credmaint -path /ASM/Self/a332f4dbe3da5f4bbf3a71895a466319 -credtype userpass -id 0 -attr passwd  -local
eZX8GNed1KfM3E92U57nSziL4HO5E

12.4.3. Add user CRSUSER__ASM_001 to password file

ASMCMD> orapwusr –add CRSUSER__ASM_001  è Use password you got it from previous command
ASMCMD> orapwusr –grant sysdba CRSUSER__ASM_001
ASMCMD> orapwusr –grant sysasm CRSUSER__ASM_001
ASMCMD> lspwusr

13 Stop and start CRS normally. All resources should start normally.

— root account
$ crsctl stop crs -f
$ crsctl start crs -wait
$ crctl check cluster
$ crsctl status res -t

14 Start CRS on the remaining cluster node(s). All resources should start normally.

— root account
$ crsctl start crs -wait
$ crctl check cluster
$ crsctl status res -t

15 Run a cluster verification of all cluster nodes. Run from one node only.

— grid account
$ cd $ORACLE_HOME/bin
$ cluvfy comp ocr -n all -verbose

16 Check the status of cluster resources. Resources should be ONLINE on all nodes.

— grid account
$ crsctl status res -t
$ srvctl status asm
$ srvctl status scan
$ srvctl status scan_listener
$ srvctl status listener

17 Stop database(s) then startup mount to recreate redo log groups on REDO disk group.

— oracle account
$ srvctl stop database -d <DB_NAME>
SQL> startup mount
SQL> select * from v$log;
SQL> select member from v$logfile;
SQL> alter database drop logfile group &id; è repeat for all redo log groups have member(s) on REDO
SQL> alter database add logfile thread &THD_NUM group &GRP_NUM (‘+REDO’) size <SIZE_KB>;
SQL> select * from v$log;
SQL> select member from v$logfile;

18 Stop database(s) then startup.

— oracle account
SQL> shutdown immediate
$ srvctl start database -d <DB_NAME>
SQL> select name,database_role,open_mode from v$database;
SQL> select instance_name,host_name,status from gv$instance;

Share:

No Comments

No comments yet.

RSS feed for comments on this post.

Leave a comment