ExaC@C DB State Failed in OCI Console While Up and Running in Reality (Fix)

Share on:

exacc

Introduction

<p”>Exadata Cloud@Customer has the particularity of bringing the best of both worlds, where on-premises Data sovereignty meets the innovation & capabilities of the Cloud. Thanks to the Control plane network that links up both ExaCC servers and OCI, users can create/manage resources through the Console or any API-based cloud tooling (terraform, OCI-CLI, SDK..). Everything you do on the exaC@C is synchronized into OCI through that layer.

 

Issue of the Day

I’ll describe a small glitch that sometimes happens to a database resource. It has no incidence on the database itself because, under EaxC@C, it works just fine. However, you can see in the screenshot that databases are marked as failed while they are actually “up and running”(and accessible) databases. 

+-------------+-----------+------------------------------------+-----------+
| Unique-Name | charset   | id                                 | state     |
+-------------+-----------+------------------------------------+-----------+
| MYCDB1_DOM  | AL32UTF8  | ocid1.database.oc1.ca-toronto-1.xxa|  FAILED   |
+-------------+-----------+------------------------------------+-----------+

 

State

We need to be mindful of what the state column really means. It’s quite self-explanatory after a deployment attempt, but for an existing DB, a state often means the database resource is down/up. In our case, however, I couldn’t detect the resource anymore, hence the state info shows “FAILED”

But before delving into it, let’s review how ExaCC database resources are seen & registered on the OCI side.

 

Database registration in ExaCC

DB registration allows performing admin tasks on the exaC@C database through OCI console & Cloud tooling.
Each database created in Exadata Cloud@Customer using API/Console will automatically be registered in OCI.

Minus a few exceptions, where OCI allows for a manual registration which are:

 cases:
   – Database, that you manually created on Exadata Cloud at Customer, using DBCA
   – Existing database, that you migrated from another platform to Exadata Cloud@Customer.
  This is done through dbaascli registerdb function, read more on Registring a Database.

 

Files created after registration
Each registered database will generate a cloud registration file (DBname.ini) located under the below directory.

$ ll /var/opt/oracle/creg/*ini
… MYCDB1.ini

 

Troubleshooting 

I first decided checked a workaround described below
Doc ID 2764524.1 EXACS DBs Show Wrong State (Failed) on OCI Webconsole

Cause: DBs registered in crs with dbname in lowercase (dborcl) instead of uppercase (DBORCL).
Suggested solution: Create a symbolic link to creg db ini file to match the case for the DB name registered in CRS.

Outcome: This didn’t fix my problem so I opened an SR to get to the bottom of this.  

 

Diagnosis

This took help from support, as they have a better view of Control plane resources metadata. Taking a look at cloud registration file content, we can see that it contains DB information usually present in the crs plus a few parameters present in the spfile. 

$ more /var/opt/oracle/creg/MYCDB1.ini

#################################################################
# This file is automatically generated by database as a service # #################################################################
acfs_vol_dir=/var/opt/oracle/dbaas_acfs
acfs_vol_sizegb=10
agentdbid=83112625-52d2-4b39-b987-1b0d7d2d70cb
aloc=/var/opt/oracle/ocde/assistants
archlog=yes
bkup_asm_spfile=+DATA1/MYCDB1_DOM/spfilemycdb1.ora
…

 

Agent resource id

Notice the agentdbid in the .ini registration file. Agent resource id, is actually the id that the control plane layer uses to identify & interact with the DB
agentdbid=83112625-52d2-4b39-b987-1b0d7d2d70cb

On top of the registration file, the agent id is also written in a rec file under /var/opt/oracle/dbaas_acfs/<DBNAME>

$ more /var/opt/oracle/dbaas_acfs/MYCDB1/83112625-52d2-4b39-b98xx.rec
{
   "agentdbid" : "83112625-52d2-4b39-b987-1b0d7d2d70cb"
}

 

Root Cause

According to OCI support, Somehow the Agent Resource ID seen in the Control plane UI console was different than the agent did in the corresponding *.ini file.

 

Solution

Take note of the agent id communicated by the support engineer & replace the id in the .ini and the .rec file.

  • Take a backup of {DBNAME}.ini file of above two DBS on all DB nodes

sudo su - oracle
$ cd /var/opt/oracle/creg
$ cp /var/opt/oracle/creg/MYCDB1.ini /var/opt/oracle/creg/MYCDB1.ini.old
  • Modify ID in {DBNAME}.ini file of the DB with the value of Agent Resource ID seen in the support console.

-- Replace agentdbid=  >> by 47098321-43d1-4b44-b997-1b0d5d1d90cb
 $ vi /var/opt/oracle/creg/MYCDB1.ini 
  • Remove the old rec file with the wrong resource and replace it with a new rec file with  the right recid

rm /var/opt/oracle/dbaas_acfs/MYCDB1/83112625-52d2-4b39-b987-1b0d7d2d70cb.rec
$ vi /var/opt/oracle/dbaas_acfs/MYCDB1/47098321-43d1-4b44-b997-1b0d5d1d90cb.rec
{
   "agentdbid" : "47098321-43d1-4b44-b997-1b0d5d1d90cb"  << new value
}
  • After the change, wait for an hour or so, for Control Plan to get in sync and verify the DB status

+-------------+-----------+------------------------------------+-----------+
| Unique-Name | charset   | id                                 | state     |
+-------------+-----------+------------------------------------+-----------+
| MYCDB1_DOM  | AL32UTF8  | ocid1.database.oc1.ca-toronto-1.xxa| AVAILABLE |
+-------------+-----------+------------------------------------+-----------+

 

Can we spot/fix the agent id in Oracle Cloud Infrastructure?

You can’t see the agent resource id in your console as an end user. It is unfortunately internal metadata for the control plane. This means you will have to open an SR each time an issue like this happens. However, I have opened an enhancement request to allow users to see the control plane agentid.

 

Conclusion

  • We can say that failed database state in the OCI console doesn’t always mean the resource is down 

  • It is possible that migrated databases from another platform could lead to this phenomenon

  • There is no way as of now for you to know the agent resource id that the control plane is seeing. 

  • Hope control plane metadata like agent resource id visibility can be achieved in a future release.

  • Until then this workaround can still help those who spot such behavior

Share on:

More from this Author

OCI FortiGate HA Cluster – Reference Architecture Code Review and Fixes

OCI FortiGate HA Cluster – Reference Architecture: Code Review and Fixes

Introduction OCI Quick Start repositories on GitHub are collections of Terraform scripts and configurations provided by Oracle. These repositories ... Read More

What Autoupgrade Won’t Catch for You when Moving to 19c Part1 Ghost OLAP

What Autoupgrade Won’t Catch for You when Moving to 19c Part1: Ghost OLAP

Introduction So far, I have used Oracle AutoUpgrade, many times in 3 different OS’. Yet the more you think you’ve seen it all and reached the ... Read More

Back to Top