Skip to main content

MySQL InnoDB cluster troubleshooting | commands


Cluster Validation:
select * from performance_schema.replication_group_members;
All members should be online.
select instance_name, mysql_server_uuid, addresses from  mysql_innodb_cluster_metadata.instances;
All instances should return same value for mysql_server_uuid
SELECT @@GTID_EXECUTED;
All nodes should return same value

Frequently use commands:

mysql> SET SQL_LOG_BIN = 0; 
mysql> stop group_replication;
mysql> set global super_read_only=0;
mysql> drop database mysql_innodb_cluster_metadata;
mysql> RESET MASTER;
mysql> RESET SLAVE ALL;

JS > var cluster = dba.getCluster()
JS > var cluster = dba.getCluster("<Cluster_name>")
JS > var cluster = dba.createCluster('name')

JS > cluster.removeInstance('root@<IP_Address>:<Port_No>',{force: true})
JS > cluster.addInstance('root@<IP add>,:<port>')
JS > cluster.addInstance('root@
<IP add>,:<port>')
JS > dba.getCluster ()
JS > dba.getCluster().status()
 

JS >dba.getCluster().checkInstanceState('root@<IP_Address>:<Port_No>')
JS >
dba.getCluster().rejoinInstance('root@<IP_Address>:<Port_No>')


Connect to cluster:
mysql-js>  \c root@<host_name>:<port_no>

Find out node is R/W  OR what R:
mysql-js> dba.getCluster().status()
Note: If you get following message, then it's time to reboot cluster from complete outage
Dba.getCluster: This function is not available through a session to a standalone instance (metadata exists, but GR is not active) (RuntimeError)

Restoring a Cluster from Quorum Loss:
Re-establish quorum using the method cluster.forceQuorumUsingPartitionOf()
JS > cluster.forceQuorumUsingPartitionOf("root@<IP address>:<port>")

Ref.: https://dev.mysql.com/doc/refman/5.7/en/mysql-innodb-cluster-working-with-cluster.html

Reboot cluster from outages:
MySQL  <node1_ip_address>:<port_no> ssl  JS > dba.rebootClusterFromCompleteOutage('<cluster_name>')
Reconfiguring the cluster '<cluster_name>' from complete outage...
The instance '<node2_ip_address>:<port_no>' was part of the cluster configuration.
Would you like to rejoin it to the cluster? [y/N]: y
The instance '<node3_ip_address>:<port_no>' was part of the cluster configuration.
Would you like to rejoin it to the cluster? [y/N]: y
The cluster was successfully rebooted.
<Cluster:cluster_name>
Reboot cluster from outage:
dba.rebootClusterFromCompleteOutage('cluster_name')
Get cluster status:
 MySQL  <node1_ip_address>:<port_no> ssl  JS > var cluster = dba.getCluster()
 MySQL  <node1_ip_address>:<port_no> ssl  JS > cluster.status()
{
    "clusterName": "<cluster_name>",
    "defaultReplicaSet": {
        "name": "default",
        "primary": "<node1_ip_address>:<port_no>",
        "ssl": "REQUIRED",
        "status": "OK",
        "statusText": "Cluster is ONLINE and can tolerate up to ONE failure.",
        "topology": {
            "<node2_ip_address>:<port_no>": {
                "address": "<node2_ip_address>:<port_no>",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "<node1_ip_address>:<port_no>": {
                "address": "<node1_ip_address>:<port_no>",
                "mode": "R/W",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            },
            "<node3_ip_address>:<port_no>": {
                "address": "
                ",
                "mode": "R/O",
                "readReplicas": {},
                "role": "HA",
                "status": "ONLINE"
            }
        }
    },
    "groupInformationSourceMember": "mysql://root@<node1_ip_address>:<port_no>"
}
Describe cluster:
 MySQL  <node1_ip_address>:<port_no> ssl  JS > var cluster = dba.getCluster()
 MySQL  <node1_ip_address>:<port_no> ssl  JS > cluster.describe();
{
    "clusterName": "<cluster_name>",
    "defaultReplicaSet": {
        "name": "default",
        "topology": [
            {
                "address": "<node1_ip_address>:<port_no>",
                "label": "<node1_ip_address>:<port_no>",
                "role": "HA"
            },
            {
                "address": "<node3_ip_address>:<port_no>",
                "label": "<node3_ip_address>:<port_no>",
                "role": "HA"
            },
            {
                "address": "<node2_ip_address>:<port_no>",
                "label": "<node2_ip_address>:<port_no>",
                "role": "HA"
            }
        ]
    }
}

Remove cluster:
mysql-js> cluster.removeInstance('root@localhost:<port_no>')
Rescan cluster:
cluster.rescan()
Cluster check instance:
mysql-js> cluster.checkInstanceState('root@<node1_ip_address>:<port_no>')
mysql-js> cluster.checkInstanceState('root@<node3_ip_address>:<port_no>')
mysql-js> cluster.checkInstanceState('root@<node2_ip_address>:<port_no>')

MySQL  <node1_ip_address>:<port_no> ssl  JS > cluster.checkInstanceState('root@<node1_ip_address>:<port_no>')
Please provide the password for 'root@<node1_ip_address>:<port_no>': ********
Analyzing the instance replication state...

The instance 'root@<node1_ip_address>:<port_no>' is valid for the cluster.
The instance is fully recoverable.

{
    "reason": "recoverable",
    "state": "ok"
}

cluster.checkInstanceState('<node3_ip_address>:<port_no>')

Dissolve cluster:
mysql-js> cluster.dissolve({force:true})
Rejoin cluster:
var cluster = dba.getCluster()
cluster.rejoinInstance( 'root@<node3_ip_address>:<port_no>') 


Debug MySQL cluster:

# mysqlsh --log-level=DEBUG3

 JS > dba.verbose=2
 JS >  \c root@<IP_Address>:<Port>
 JS >   dba.configureInstance()
 JS >  MySQL  <IP_Address>:<Port>
 JS > var cluster = dba.createCluster('<cluster_name>')

Getting errors while adding node to an existing cluster:

JS > cluster.addInstance('root@<IP_Address>:<Port>')

"status": "RECOVERING"
"status": "(MISSING)"
[ERROR] [MY-011526] [Repl] Plugin group_replication reported: 'This member has more executed transactions than those present in the group. Local transactions: 0d310ed3-825e-11e9-8729-005056b2df3b:1-3 > Group transactions: 23f0be3a-825b-11e9-b1d9-005056b2d6d7:1-27,
[ERROR] [MY-011583] [Repl] Plugin group_replication reported: 'For details please check performance_schema.replication_connection_status table and error log messages of Slave I/O for channel group_replication_recovery.'
[ERROR] [MY-011582] [Repl] Plugin group_replication reported: 'There was an error when connecting to the donor server. Please check that group_replication_recovery channel credentials and all MEMBER_HOST column values of performance_schema.replication_group_members table are correct and DNS resolvable.'
[ERROR] [MY-010584] [Repl] Slave I/O for channel 'group_replication_recovery': error connecting to master
ERROR: Error joining instance to cluster: '<IP_Address>:<Port>' - Query failed. MySQL Error (3092): ClassicSession.query: The server is not configured properly to be an active member of the group. Please see more details on error log.. Query: START group_replication: MySQL Error (3092):
[ERROR] [MY-011735] [Repl] Plugin group_replication reported: '[GCS] Error on opening a connection to
Solution:
Recreate empty directory for data, log, iblogs, binlog, execute following and add try to add node

mysql> SET sql_log_bin = OFF;
mysql> alter user root@localhost identified by '<code>';mysql> update mysql.user set host='%' where user='root';mysql> CREATE USER 'repl'@'%' IDENTIFIED BY '<code>';mysql> GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%';mysql> FLUSH PRIVILEGES;mysql> SET sql_log_bin = ON; 

Ref.: https://dev.mysql.com/doc/refman/5.7/en/mysql-innodb-cluster-working-with-cluster.html


Getting instance belonging to an unmanaged replication group (RuntimeError):

MySQL JS > dba.getCluster()

Dba.getCluster: This function is not available through a session to an instance belonging to an unmanaged replication group (RuntimeError)
AND
InnoDB cluster status is different for mysqlsh and performance_schema.replicaion_group_members. In performance_schema it is ONLINE but mysqlsh depicts MISSING.
Solution:
Restore auto.cnf form backup, restart mysqld. If require remove instance and add it back again. 
Meta Link - Cluster.status() is out of sync with the Group Replication view replication_group_members (Doc ID 2526137.1)
About auto.cnf and server_uuid:
auto.cnf file is similar to my.cnf. It contains server_uuid. Server_uuid is generated automatically. When starting MySQL it read and use server_uuid from auto.cnf. The value of the server_uuid used in reapplication. server_uuid is true UUID in an addition to user supplied server_id system variable.

Comments

Popular posts from this blog

InnoDB cluster Remove Instance Force | Add InnoDB instance

InnoDB cluster environment UUID is different on node: To fix it stop group replication, remove instance (use force if require), add instance back Identify the node which is not in sync: Execute following SQL statement on each node and identify the node has different UUID on all nodes. mysql> select * from mysql_innodb_cluster_metadata.instances; Stop group replication: Stop group replication on the node which does not have same UUID on all nodes. mysql > stop GROUP_REPLICATION; Remove instances from cluster: Remove all secondary node from the cluster and add them back if require. $mysqlsh JS >\c root@<IP_Address>:<Port_No> JS > dba.getCluster().status() JS > dba.getCluster () <Cluster:cluster_name> JS > var cluster = dba.getCluster("cluster_name"); JS >  cluster.removeInstance('root@<IP_Address>:<Port_No>'); If you get "Cluster.removeInstance: Timeout reached waiting......" JS > cluster.removeInstance(...

MySQL slave Error_code: 1032 | MySQL slave drift | HA_ERR_KEY_NOT_FOUND

MySQL slave Error_code: 1032 | MySQL slave drift: With several MySQL, instance with master slave replication, I have one analytics MySQL, environment which is larger in terabytes, compared to other MySQL instances in the environment. Other MySQL instances with terabytes of data are running fine master, slave replication. But this analytics environment get started generating slave Error_code :1032. mysql> show slave status; Near relay log: Error_code: 1032; Can't find record in '<table_name>', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log <name>-bin.000047, end_log_pos 5255306 Near master section: Could not execute Update_rows event on table <db_name>.<table_name>; Can't find record in '<table_name>', Error_code: 1032; Can't find record in '<table_name>', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log <name>-bin.000047, end_l...