Skip to main content

MySQL InnoDB cluster troubleshooting | commands


Cluster Validation:
select * from performance_schema.replication_group_members;
All members should be online.
select instance_name, mysql_server_uuid, addresses from  mysql_innodb_cluster_metadata.instances;
All instances should return same value for mysql_server_uuid
SELECT @@GTID_EXECUTED;
All nodes should return same value

Frequently use commands:

mysql> SET SQL_LOG_BIN = 0; 
mysql> stop group_replication;
mysql> set global super_read_only=0;
mysql> drop database mysql_innodb_cluster_metadata;
mysql> RESET MASTER;
mysql> RESET SLAVE ALL;

JS > var cluster = dba.getCluster()
JS > var cluster = dba.getCluster("<Cluster_name>")
JS > var cluster = dba.createCluster('name')

JS > cluster.removeInstance('root@<IP_Address>:<Port_No>',{force: true})
JS > cluster.addInstance('root@<IP add>,:<port>')
JS > cluster.addInstance('root@
<IP add>,:<port>')
JS > dba.getCluster ()
JS > dba.getCluster().status()
 

JS >dba.getCluster().checkInstanceState('root@<IP_Address>:<Port_No>')
JS >
dba.getCluster().rejoinInstance('root@<IP_Address>:<Port_No>')


Connect to cluster:
mysql-js>  \c root@<host_name>:<port_no>

Find out node is R/W  OR what R:
mysql-js> dba.getCluster().status()
Note: If you get following message, then it's time to reboot cluster from complete outage
Dba.getCluster: This function is not available through a session to a standalone instance (metadata exists, but GR is not active) (RuntimeError)

Restoring a Cluster from Quorum Loss:
Re-establish quorum using the method cluster.forceQuorumUsingPartitionOf()
JS > cluster.forceQuorumUsingPartitionOf("root@<IP address>:<port>")

Ref.: https://dev.mysql.com/doc/refman/5.7/en/mysql-innodb-cluster-working-with-cluster.html

Reboot cluster from outages:
MySQL  <node1_ip_address>:<port_no> ssl  JS > dba.rebootClusterFromCompleteOutage('<cluster_name>')
Reconfiguring the cluster '<cluster_name>' from complete outage...
The instance '<node2_ip_address>:<port_no>' was part of the cluster configuration.
Would you like to rejoin it to the cluster? [y/N]: y
The instance '<node3_ip_address>:<port_no>' was part of the cluster configuration.
Would you like to rejoin it to the cluster? [y/N]: y
The cluster was successfully rebooted.
<Cluster:cluster_name>
Reboot cluster from outage:
dba.rebootClusterFromCompleteOutage('cluster_name')
Get cluster status:
 MySQL  <node1_ip_address>:<port_no> ssl  JS > var cluster = dba.getCluster()
 MySQL  <node1_ip_address>:<port_no> ssl  JS > cluster.status()
{
    "clusterName": "<cluster_name>",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "<node1_ip_address>:<port_no>",
        "ssl": "REQUIRED",
        "status": "OK",
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
        "topology": {
            "<node2_ip_address>:<port_no>": {
                "address": "<node2_ip_address>:<port_no>",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "<node1_ip_address>:<port_no>": {
                "address": "<node1_ip_address>:<port_no>",
                "mode": "R/W",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "<node3_ip_address>:<port_no>": {
                "address": "
                ",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            }
        }
    },
    "groupInformationSourceMember": "mysql://root@<node1_ip_address>:<port_no>"
}
Describe cluster:
 MySQL  <node1_ip_address>:<port_no> ssl  JS > var cluster = dba.getCluster()
 MySQL  <node1_ip_address>:<port_no> ssl  JS > cluster.describe();
{
    "clusterName": "<cluster_name>",
    "defaultReplicaSet": {
        "name": "default",
        "topology": [
            {
                "address": "<node1_ip_address>:<port_no>",
                "label": "<node1_ip_address>:<port_no>",
                "role": "HA"
            },
            {
                "address": "<node3_ip_address>:<port_no>",
                "label": "<node3_ip_address>:<port_no>",
                "role": "HA"
            },
            {
                "address": "<node2_ip_address>:<port_no>",
                "label": "<node2_ip_address>:<port_no>",
                "role": "HA"
            }
        ]
    }
}

Remove cluster:
mysql-js> cluster.removeInstance('root@localhost:<port_no>')
Rescan cluster:
cluster.rescan()
Cluster check instance:
mysql-js> cluster.checkInstanceState('root@<node1_ip_address>:<port_no>')
mysql-js> cluster.checkInstanceState('root@<node3_ip_address>:<port_no>')
mysql-js> cluster.checkInstanceState('root@<node2_ip_address>:<port_no>')

MySQL  <node1_ip_address>:<port_no> ssl  JS > cluster.checkInstanceState('root@<node1_ip_address>:<port_no>')
Please provide the password for 'root@<node1_ip_address>:<port_no>': ********
Analyzing the instance replication state...

The instance 'root@<node1_ip_address>:<port_no>' is valid for the cluster.
The instance is fully recoverable.

{
    "reason": "recoverable",
    "state": "ok"
}

cluster.checkInstanceState('<node3_ip_address>:<port_no>')

Dissolve cluster:
mysql-js> cluster.dissolve({force:true})
Rejoin cluster:
var cluster = dba.getCluster()
cluster.rejoinInstance( 'root@<node3_ip_address>:<port_no>') 


Debug MySQL cluster:

# mysqlsh --log-level=DEBUG3

 JS > dba.verbose=2
 JS >  \c root@<IP_Address>:<Port>
 JS >   dba.configureInstance()
 JS >  MySQL  <IP_Address>:<Port>
 JS > var cluster = dba.createCluster('<cluster_name>')

Getting errors while adding node to an existing cluster:

JS > cluster.addInstance('root@<IP_Address>:<Port>')

"status": "RECOVERING"
"status": "(MISSING)"
[ERROR] [MY-011526] [Repl] Plugin group_replication reported: 'This member has more executed transactions than those present in the group. Local transactions: 0d310ed3-825e-11e9-8729-005056b2df3b:1-3 > Group transactions: 23f0be3a-825b-11e9-b1d9-005056b2d6d7:1-27,
[ERROR] [MY-011583] [Repl] Plugin group_replication reported: 'For details please check performance_schema.replication_connection_status table and error log messages of Slave I/O for channel group_replication_recovery.'
[ERROR] [MY-011582] [Repl] Plugin group_replication reported: 'There was an error when connecting to the donor server. Please check that group_replication_recovery channel credentials and all MEMBER_HOST column values of performance_schema.replication_group_members table are correct and DNS resolvable.'
[ERROR] [MY-010584] [Repl] Slave I/O for channel 'group_replication_recovery': error connecting to master
ERROR: Error joining instance to cluster: '<IP_Address>:<Port>' - Query failed. MySQL Error (3092): ClassicSession.query: The server is not configured properly to be an active member of the group. Please see more details on error log.. Query: START group_replication: MySQL Error (3092):
[ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error on opening a connection to
Solution:
Recreate empty directory for data, log, iblogs, binlog, execute following and add try to add node

mysql> SET sql_log_bin = OFF;
mysql> alter user root@localhost identified by '<code>';mysql> update mysql.user set host='%' where user='root';mysql> CREATE USER 'repl'@'%' IDENTIFIED BY '<code>';mysql> GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%';mysql> FLUSH PRIVILEGES;mysql> SET sql_log_bin = ON; 

Ref.: https://dev.mysql.com/doc/refman/5.7/en/mysql-innodb-cluster-working-with-cluster.html


Getting instance belonging to an unmanaged replication group (RuntimeError):

MySQL JS > dba.getCluster()

Dba.getCluster: This function is not available through a session to an instance belonging to an unmanaged replication group (RuntimeError)
AND
InnoDB cluster status is different for mysqlsh and performance_schema.replicaion_group_members. In performance_schema it is ONLINE but mysqlsh depicts MISSING.
Solution:
Restore auto.cnf form backup, restart mysqld. If require remove instance and add it back again. 
Meta Link - Cluster.status() is out of sync with the Group Replication view replication_group_members (Doc ID 2526137.1)
About auto.cnf and server_uuid:
auto.cnf file is similar to my.cnf. It contains server_uuid. Server_uuid is generated automatically. When starting MySQL it read and use server_uuid from auto.cnf. The value of the server_uuid used in reapplication. server_uuid is true UUID in an addition to user supplied server_id system variable.

Comments

Popular posts from this blog

InnoDB cluster Remove Instance Force | Add InnoDB instance

InnoDB cluster environment UUID is different on node: To fix it stop group replication, remove instance (use force if require), add instance back Identify the node which is not in sync: Execute following SQL statement on each node and identify the node has different UUID on all nodes. mysql> select * from mysql_innodb_cluster_metadata.instances; Stop group replication: Stop group replication on the node which does not have same UUID on all nodes. mysql > stop GROUP_REPLICATION; Remove instances from cluster: Remove all secondary node from the cluster and add them back if require. $mysqlsh JS >\c root@<IP_Address>:<Port_No> JS > dba.getCluster().status() JS > dba.getCluster () <Cluster:cluster_name> JS > var cluster = dba.getCluster("cluster_name"); JS >  cluster.removeInstance('root@<IP_Address>:<Port_No>'); If you get "Cluster.removeInstance: Timeout reached waiting......" JS > cluster.removeInstance(...

Oracle E-Business Suite Online Patch Phases executing adop

Following description about Oracle E-Business Suite is high level and from documentation https://docs.oracle.com/cd/E26401_01/doc.122/e22954/T202991T531062.htm#5281339 for in depth and detail description refer it. The online patching cycle phases: Prepare Apply Finalize Cutover Cleanup Prepare phase: Start a new online patching cycle, Prepares the environment for patching. $ adop phase=prepare Apply phase: Applies the specified patches to the environment. Apply one or more patches to the patch edition. $ adop phase=apply patches=123456,789101 workers=8 Finalize phase: Performs any final steps required to make the system ready for cutover. Perform the final patching operations that can be executed while the application is still online. $ adop phase=finalize Cutover phase: Shuts down application tier services, makes the patch edition the new run edition, and then restarts application tier services. This is the only phase that involves a brief ...