How to continue a Grid Infrastructure Patch in ExaCC when it has failed in Node 1

By: Gustavo Rene Antunez
Topic: ExaCC | Posted: Mar 07, 2022 at 10:00 AM EST

Share on:

This week we had an issue when we were patching EXACC Grid Infrastructure (GI) from 19.10 to 19.11. The patching died when it was running the post-patch on node 1 and left the patch in a weird “ROLLING PATCH” mode as it hadn’t finished the 19.11 patching (rootcrs.sh -postpatch) in node 1 and node 2 was still running 19.10.

What you see below is the command that was executed to initially patch the EXACC GI and the last entry in the log and that is where it died.

[root@hostname1 ~]# dbaascli patch db apply --patchid 32545008-GI --dbnames grid
...
2021-09-23 10:15:32.233368 - INFO: Running /u01/app/19.0.0.0/grid/crs/install/rootcrs.sh -postpatch
2021-09-23 10:15:32.233539 - Output from cmd /u01/app/19.0.0.0/grid/crs/install/rootcrs.sh -postpatch run on localhost  is:

We looked at the GUI (Graphical User Interface) console, and it was mentioned that the cluster was patched and in the 19.11 version.

ExaCC

However, if you went to the command line of both nodes, you would have seen that the patch was not finished.

[grid@hostname1 ~]$ $ORACLE_HOME/OPatch/opatch lspatches
32847378;OCW Interim patch for 32847378
32585572;DBWLM RELEASE UPDATE 19.0.0.0.0 (32585572)
32584670;TOMCAT RELEASE UPDATE 19.0.0.0.0 (32584670)
32576499;ACFS RELEASE UPDATE 19.11.0.0.0 (32576499)
32545013;Database Release Update : 19.11.0.0.210420 (32545013)

OPatch succeeded.

[grid@hostname2 ~]$ $ORACLE_HOME/OPatch/opatch lspatches
32240590;TOMCAT RELEASE UPDATE 19.0.0.0.0 (32240590)
32222571;OCW Interim patch for 32222571
32218663;ACFS RELEASE UPDATE 19.10.0.0.0 (32218663)
32218454;Database Release Update : 19.10.0.0.210119 (32218454)
29340594;DBWLM RELEASE UPDATE 19.0.0.0.0 (29340594)

OPatch succeeded.

The following thing to try was to relaunch the dbaascli, but it failed as it detected that a patching operation was going on.

[root@hostname1 ~]# dbaascli patch db apply --patchid 32545008-GI --dbnames grid
...
The current operation apply_async is blocked on node hostname1 due the following error: The current operation cannot proceed due a previous ongoing patching operation was detected

The fix to this issue is to run the post-patch as root in node 1.

[root@hostname1 ~]# /u01/app/19.0.0.0/grid/crs/install/rootcrs.sh -postpatch

Now that the post-patch is finished, the following step is to verify the stack, the patch version, and status.

[grid@hostname1 ~]$ crsctl query crs activeversion -f
Oracle Clusterware active version on the cluster is [19.0.0.0.0]. The cluster upgrade state is [ROLLING PATCH]. The cluster active patch level is [1944883066].

[grid@hostname1 ~]$ crsctl query crs releasepatch
Oracle Clusterware release patch level is [1988519045] and the complete list of patches [32545013 32576499 32584670 32585572 32847378 ] have been applied on the local node. The release patch string is [19.11.0.0.0].

[grid@hostname1 ~]$ crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

Now in node 2, we ran the following command to finish the patching. As you can see, it is a bit different from what we ran initially.

[root@hostname2 ~]# dbaascli patch db apply --patchid 32545008-GI --instance1 hostname2:/u01/app/19.0.0.0/grid --dbnames grid

Once the dbaascli command had finished, now we saw both nodes with the same patch version and the cluster state as NORMAL.

[grid@hostname1 ~]$ $ORACLE_HOME/OPatch/opatch lspatches
32847378;OCW Interim patch for 32847378
32585572;DBWLM RELEASE UPDATE 19.0.0.0.0 (32585572)
32584670;TOMCAT RELEASE UPDATE 19.0.0.0.0 (32584670)
32576499;ACFS RELEASE UPDATE 19.11.0.0.0 (32576499)
32545013;Database Release Update : 19.11.0.0.210420 (32545013)

OPatch succeeded.

[grid@hostname2 ~]$ $ORACLE_HOME/OPatch/opatch lspatches
32847378;OCW Interim patch for 32847378
32585572;DBWLM RELEASE UPDATE 19.0.0.0.0 (32585572)
32584670;TOMCAT RELEASE UPDATE 19.0.0.0.0 (32584670)
32576499;ACFS RELEASE UPDATE 19.11.0.0.0 (32576499)
32545013;Database Release Update : 19.11.0.0.210420 (32545013)

OPatch succeeded.

[grid@hostname2 ~]$ crsctl query crs activeversion
Oracle Clusterware active version on the cluster is [19.0.0.0.0]

[grid@hostname2 ~]$ crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
 
[grid@hostname2 ~]$ crsctl query crs activeversion -f
Oracle Clusterware active version on the cluster is [19.0.0.0.0]. The cluster upgrade state is [NORMAL]. The cluster active patch level is [1988519045].

Hope this blog helps you out should you ever be in a patching situation where one node is not patched when the patching died in node 1.

Share on:

More from this Author

OCI, Terraform & IaC Creating Compartments for CIS Foundation Architecture by Gustavo

By: Gustavo Rene Antunez
Topic: ExaCC | Posted: Mar 06, 2024

OCI, Terraform & IaC: Creating Compartments for CIS Foundation Architecture

In this third blog post series, we will be creating four main compartments Security Compartment Network Compartment App/Dev Compartment Database ... Read More

OCI, Terraform & IaC Creating a Compartment

By: Gustavo Rene Antunez
Topic: ExaCC | Posted: Feb 29, 2024

OCI, Terraform & IaC: Creating a Compartment

In my previous post, I talked about the setup of Terraform and a primer on what it is. In this blog post, I will create a simple resource in OCI. One ... Read More

Blogs