Skip to main content

MySQL Group Replication | Group Replication

MySQL Group Replication:
Group Replication:
It is a plugin build on existing Mysql replication infrastructure features such as binary log, row-based logging, and global transaction identifiers. Group replication is not a regular point-to-point connection, as in classical Replication, but rather a different paradigm: Group Communication. It is a classic modular and layered piece of software, and communication module - Group communication API and Corosync up to MySQL Group Replication 0.5.0.



Group replication plugin:
Consists of API- Capture / Apply / Life cycle, Capture, Applier, Recovery, Replication protocol logics, Group communication system API, Group Communication Engine (Paxos variant),
Mencius. It is Paxos-based solution, named eXtended COMmunications, or simply XCOM, which is a key component in the MySQL Group Replication.Key functionalities of XCOM are Order Delivery, Dynamic Membership, and Failure detection

Paxos is probably the most well known consensus protocol and works in two phases.

Set of APIs:

Set of APIs for capture, apply, and lifecycle, which control how the plugin interacts with MySQL Server.
Interfaces:
Interfaces, which make information flow from the server to the plugin such as notifications for events such as the server starting, the server recovering, the server being ready to accept connections, and the server being about to commit a transaction,and  plugin to the server, instructs the server to perform actions such as committing or aborting ongoing transactions, or queuing transactions in the relay log.

  • A Control interface that will allow a member to manage its status within a group with primitives like Join, Leave and callbacks to View Membership information.
  • A Message interface that allows a member to send and receive messages.
  • A Statistics interface to store and extract information about the Group and Messages.
The capture component keeps track of context related to transactions that are executing
The applier component execute remote transactions on the database
The recovery component manages distributed recovery, and get a server that is joining the group up to date by selecting the donor, orchestrating the catch up procedure and reacting to donor failures.
The replication protocol module contains the specific logic of the replication protocol. It handles conflict detection, and receives and propagates transactions to the group.
The Group Communication System (GCS) API, a high level API that abstracts the properties required to build a replicated state machine Paxos-based group communication engine (XCom) handles communications with the members of the replication group
Group Membership service:
Group Membership service is aware of the groups members in any moment in time, allow a member to Join and Leave a group, informing all the interested parties of that event.
Total Order broadcast primitive service - allows for a member to send a message to a Group and ensure that, if one member receives the message, then all members receive it, also guarantees that all messages arrive in the same order in all members that belong to a Group.

  • join()  - Used by new node to enter a group
  • leave() - Used when node decide to leave
Corosync is a cluster engine that has in its base the usage of Group Communication. Its goal is to aid in the development of reliable and highly available application. Taking into account our requirements its a good first choice since: It offers what we need in terms of functionality in its Closed Process. Drawback of Corosync are - no support for windows, not friendly with multi tenant - cloud computing, and security.
Group communication model:

  • It has a C API.
  • It is proven and deployed solution, for instance, in Pacemaker and Apache Qpid.
Ref.:
https://dev.mysql.com/doc/refman/8.0/en/group-replication-plugin-architecture.html
http://mysqlhighavailability.com/group-communication-behind-the-scenes/
https://dev.mysql.com/worklog/task/?id=8793
Explore Pacemaker and Corosync:
https://www.lisenet.com/2016/activepassive-mysql-high-availability-pacemaker-cluster-with-drbd-on-centos-7/
https://www.digitalocean.com/community/tutorials/how-to-create-a-high-availability-setup-with-corosync-pacemaker-and-floating-ips-on-ubuntu-14-04
http://blog.ulf-wendel.de/2013/mini-poc-using-a-group-communication-system-for-mysql-ha/
http://corosync.github.io/corosync/
https://mysqlhighavailability.com/mysql-group-replication-a-small-corosync-guide/
Group Replication v/s Galera:
https://dzone.com/articles/the-quest-for-better-mysql-replication-galera-vs-group-replication





Comments

Popular posts from this blog

MySQL InnoDB cluster troubleshooting | commands

Cluster Validation: select * from performance_schema.replication_group_members; All members should be online. select instance_name, mysql_server_uuid, addresses from  mysql_innodb_cluster_metadata.instances; All instances should return same value for mysql_server_uuid SELECT @@GTID_EXECUTED; All nodes should return same value Frequently use commands: mysql> SET SQL_LOG_BIN = 0;  mysql> stop group_replication; mysql> set global super_read_only=0; mysql> drop database mysql_innodb_cluster_metadata; mysql> RESET MASTER; mysql> RESET SLAVE ALL; JS > var cluster = dba.getCluster() JS > var cluster = dba.getCluster("<Cluster_name>") JS > var cluster = dba.createCluster('name') JS > cluster.removeInstance('root@<IP_Address>:<Port_No>',{force: true}) JS > cluster.addInstance('root@<IP add>,:<port>') JS > cluster.addInstance('root@ <IP add>,:<port> ') JS > dba.getC...

MySQL slave Error_code: 1032 | MySQL slave drift | HA_ERR_KEY_NOT_FOUND

MySQL slave Error_code: 1032 | MySQL slave drift: With several MySQL, instance with master slave replication, I have one analytics MySQL, environment which is larger in terabytes, compared to other MySQL instances in the environment. Other MySQL instances with terabytes of data are running fine master, slave replication. But this analytics environment get started generating slave Error_code :1032. mysql> show slave status; Near relay log: Error_code: 1032; Can't find record in '<table_name>', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log <name>-bin.000047, end_log_pos 5255306 Near master section: Could not execute Update_rows event on table <db_name>.<table_name>; Can't find record in '<table_name>', Error_code: 1032; Can't find record in '<table_name>', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log <name>-bin.000047, end_l...

InnoDB cluster Remove Instance Force | Add InnoDB instance

InnoDB cluster environment UUID is different on node: To fix it stop group replication, remove instance (use force if require), add instance back Identify the node which is not in sync: Execute following SQL statement on each node and identify the node has different UUID on all nodes. mysql> select * from mysql_innodb_cluster_metadata.instances; Stop group replication: Stop group replication on the node which does not have same UUID on all nodes. mysql > stop GROUP_REPLICATION; Remove instances from cluster: Remove all secondary node from the cluster and add them back if require. $mysqlsh JS >\c root@<IP_Address>:<Port_No> JS > dba.getCluster().status() JS > dba.getCluster () <Cluster:cluster_name> JS > var cluster = dba.getCluster("cluster_name"); JS >  cluster.removeInstance('root@<IP_Address>:<Port_No>'); If you get "Cluster.removeInstance: Timeout reached waiting......" JS > cluster.removeInstance(...