Skip to main content

MySQL | Maria DB Galera Cluster Concepts | Galera Cluster Components

Features of Galera Cluster:
  • Multi-master - Galera cluster is real multi master, all the nodes are available for read and write operation.
  • Synchronous replication - Replication in Galera cluster is synchronous. There is no slave, so there is no slave lag, either write to all node or all roll back.
  • Tightly coupled - All nodes in Galera cluster are coupled tightly. Data on all node is same.
  • Performance - Galera cluster support multi threaded to balance workload
  • No failover - As there is no slave so there is not master slave failover.
  • Hot standby - Standby nodes are always available and have same data, so there is hot standby.
  • Automatic node provisioning - While adding new node to cluster family there is no need to manually back up the database, no need to restore backup to new node.
  • InnoDB - Galera cluster supports only InnoDB engine.
  • Transparent to application - Galera cluster is transparent to application
  • No splitting - No read and write node, all nodes are read and write, so no splitting is require.
    Galera cluster Components for MySQL | MariaDB: DBMS, Plugins, and Interface.
  • Transactional Database Management System that runs on the individual node, MySQL or MariaDB.
  • Galera Replication Plugin which enables write-set replication service functionality.
  • Group Communication Plugins, such as gcomm and Spread. The Spread toolkit provides a high performance messaging service that is resilient to faults across local and wide area networks.
  • wsrep API. The interface consists of wsrep hooks and dlopen()
1. wsrep hooks - integration with DB server engine for write-set replication.
2. dlopen() - Function that makes the wsrep provider available to the wsrep hooks
wsrep API uses a Global Transaction ID in order to keep the state identical across the cluster.
Galera Replication Plugin consists of the following components: 
  • Certification Layer which prepares the write-sets and performs certification checks on them, ensuring that they can be applied.
  • Replication Layer which manages the replication protocol and provides the total ordering capability.
Group Communication Framework:
Gcomm:
This layer provides a plugin architecture for the various group communication systems that connect to Galera Cluster.
Spread:
The Spread toolkit provides a high performance messaging service that is resilient to faults across local and wide area networks. Spread functions as a unified message bus for distributed applications, and provides highly tuned application-level multicast, group communication, and point to point support. Spread services range from reliable messaging to fully ordered messages with virtual synchrony delivery guarantees.
Transaction Flow: Client issues a COMMIT command -> All changes collected into a write-set -> Sends write-set to all the other nodes -> Certification test, using the primary keys, on each node in the cluster, including the node that originates the write-set -> Nodes receive write-sets and organize them into the global ordering -> If test fails, node drops the write-set -> Cluster rolls back transaction -> If test succeeds, transaction commits and write-set is applied to the rest of the cluster.
Certification-based replication:
  • It is synchronize replication, and use eager replication.
  • Write-sets bundles database writes in a single write-set message. It avoids the coordination of nodes one operation at a time.
  • Requires transactional DB which can rollback uncommitted changes.
  • Atomic Changes - database changes implemented atomically, means either all occur, else nothing occurs.
  • Replication events are ordered globally on all instances in the same order.
  • Use Group communication, and Transaction ordering technique.
  • Broadcast service establishes a global total order among concurrent transactions.
  • For consistency coordinated certification process is implemented to enforce global consistency.
  • Galera Cluster achieves synchronous replication by ensuring that transactions copy to all nodes an execute according to a cluster-wide ordering.
  • That said, the transaction applies and commits occur asynchronously as they replicate through the cluster.
How Galera Cluster handles state changes:
1.On one node in the cluster, a state change occurs on the database.
2.In the database, the wsrep hooks translate the changes to the write-set.
3.dlopen() makes the wsrep provider functions available to the wsrep hooks.
4.The Galera Replication plugin handles write-set certification and replication to the cluster.
Provisioning:
The process of replicating data from the cluster to the individual node, bringing the node into sync with the cluster, is known as provisioning.
To provision nodes two methods are available:
01. State Snapshot Transfers (SST) : Snapshot of the entire node state transfers.
02. Incremental State Transfers (IST) : Only missing transactions transfer.
State Snapshot Transfers:
After joining the cluster, the new node initiates a State Snapshot Transfer to synchronize its data with a node that is already part of the cluster. To transfer a state from one database to another two options are available
1. Logical - using mysqldump, slowest method for State Snapshot Transfers, blocking method,  donor node becomes READ-ONLY for the duration of the transfer.
2. Physical - Xtrabackup, rsync, rsync_wan - faster than mysqldump, only used on server startup, receiving server requires very similar configurations to the donor, (for example, both servers must use the same innodb_file_per_table value).,copies the data files directly from server to server.
Incremental Snapshot Transfers:
In an Incremental State Transfer (IST), the cluster provisions a node by identifying the missing transactions on the joiner and sends them only, instead of the entire state.
This provisioning method is only available :
  • When joiner node state UUID is the same as that of the group.
  • When all missing write-sets are available in the donor’s write-set cache.

Write-set Cache (GCache):Galera Cluster stores write-sets in a special cache called the Write-set Cache, or GCache. GCache cache is a memory allocator for write-sets. Its primary purpose is to minimize the write-set footprint on the RAM. Galera Cluster improves upon this through the offload write-set storage to disk.
GCache employs three types of storage:
Permanent In-Memory Store: Here write-sets allocate using the default memory allocator for the operating system. This is useful in systems that have spare RAM. The store has a hard size limit. By default it is disabled.
Permanent Ring-Buffer File: Here write-sets pre-allocate to disk during cache initialization.
This is intended as the main write-set store. By default, its size is 128Mb.
On-Demand Page Store: Here write-sets allocate to memory-mapped page files during runtime as necessary.
By default, its size is 128Mb, but can be larger if it needs to store a larger write-set. The size of the page store is limited by the free disk space. By default, Galera Cluster deletes page files when not in use, but you can set a limit on the total size of the page files to keep.
When all other stores are disabled, at least one page file remains present on disk.
Flow Control:
  • It is a feedback mechanism, used to manages the replication process. Allows a node to pause and resume replication according to its needs.
  • Nodes receive write-sets and organize them into the global ordering. Transactions that the node receives from the cluster but which it has not applied and committed, are kept in the received queueWhen the received queue reaches a certain size the node triggers Flow Control.
  • The node pauses replication, then works through the received queue.
Explore Flow Control from Percona XtraDB cluster : https://www.percona.com/blog/2013/05/02/galera-flow-control-in-percona-xtradb-cluster-for-mysql/
When it reduces the received queue to a more manageable size, the node resumes replication.
Node Status:
Undefined: Node that is not part of the Primary Component.
Joiner: Node that is part of the Primary Component and receiving a state snapshot transfer.
Donor: Node that is part of the Primary Component and sending a state snapshot transfer.
Joined: Node that is part of the Primary Component, in a complete state and is catching up with the cluster.
Synced: Node that is synchronized with cluster.
Error: An error has occurred. This status string may provide an error code with more information on what occurred.
Ref.: http://galeracluster.com/documentation-webpages/notificationcmd.html#notification-cmd

Check Node status at MySQL prompt:

SHOW STATUS LIKE 'wsrep_local_state';
+-------------------+--------------+
| Variable_name     | Value |
+-------------------+--------------+
| wsrep_local_state | 4     |
+-------------------+--------------+
Value 1 - Node starts and establishes a connection to the Primary Component.
Value 2 - Node succeeds with a state transfer request, it begins to cache write-sets.
Value 3 - Node receives a State Snapshot Transfer, has all cluster data and begins to apply the cached write-sets.
Node enables Flow Control to ensure an eventual decrease in the slave queue.
Value 4 - Node finishes catching up with the cluster, slave queue is empty and it enables Flow Control to keep it empty.
The node sets the MySQL status variable wsrep_ready to the value 1. The node is now allowed to process transactions.
Value 5 - Node receives a state transfer request. Flow Control relaxes to DONOR. The node caches all write-sets it cannot apply.
Value 6 - Node completes the state transfer to joiner node. 
Ref.: http://galeracluster.com/documentation-webpages/nodestates.html#node-state-changes 
Received Queue: Transactions that the node receives from the cluster but which it has not applied and committed, are kept in the received queue.
Component: A component is a set of nodes that are connected to each other, but not to the nodes that form other components.
Primary Component:
In addition to single node failures, the cluster may be split into several components due to network failure. In such a situation, only one of the components can continue to modify the database state to avoid history divergence. This component is called the Primary Component (PC).
grastate.dat file, located in the datadir for your database.

Gracefully shutdown of node in cluster:

Disappearance of Node from cluster:

Galera Arbitrator - http://galeracluster.com/documentation-webpages/arbitrator.html

The GTID (Global Transaction ID) consists of:
  • A state UUID (Universal Unique Identifier), which uniquely identifies the state and the sequence of changes it undergoes
  • An ordinal sequence number (seqno, a 64-bit signed integer) to denote the position of the change in the sequence
Ref.:https://mariadb.com/kb/en/library/uuid/
http://galeracluster.com/documentation-webpages/glossary.html#term-global-transaction-id
Start MySQL service with Galera Replication and donor node:
service mysql start --wsrep_sst_donor=nodeC

Quorum: - http://galeracluster.com/documentation-webpages/weightedquorum.html#primary-component

Arbitrator: - http://galeracluster.com/documentation-webpages/arbitrator.html

Node evition scenario: - https://www.percona.com/blog/2014/09/01/galera-replication-how-to-recover-a-pxc-cluster/

Restart Cluster: - http://galeracluster.com/documentation-webpages/restartingcluster.html

Node Provisioning: -  http://galeracluster.com/documentation-webpages/nodeprovisioning.html

Schema Upgrade: - http://galeracluster.com/documentation-webpages/schemaupgrades.html

Backup Database - http://galeracluster.com/documentation-webpages/backingupthecluster.html

Advantage and disadvantage of Master slave replication using Statement, Row and Mixed Mode

Ref:

http://galeracluster.com/documentation-webpages/nodestates.htmlhttp://galeracluster.com/documentation-webpages/technicaldescription.html
https://www.slideshare.net/Grypyrg/percona-xtradb-cluster-vs-galera-cluster-vs-mysql-group-replication
https://www.percona.com/blog/2017/11/02/mysql-vs-mariadb-reality-check/
https://severalnines.com/resources/tutorials/galera-cluster-mysql-tutorial
http://galeracluster.com/documentation-webpages/restartingcluster.html
http://galeracluster.com/documentation-webpages/glossary.html#term-primary-component
http://galeracluster.com/documentation-webpages/arbitrator.html
http://galeracluster.com/documentation-webpages/weightedquorum.html#primary-component
https://www.gsp.com/cgi-bin/man.cgi?section=1&topic=galera
Quorum - http://galeracluster.com/documentation-webpages/weightedquorum.html#primary-component
Galera cluster recovery - https://www.percona.com/blog/2014/09/01/galera-replication-how-to-recover-a-pxc-cluster/

   

Comments

Post a Comment

Popular posts from this blog

MySQL InnoDB cluster troubleshooting | commands

Cluster Validation: select * from performance_schema.replication_group_members; All members should be online. select instance_name, mysql_server_uuid, addresses from  mysql_innodb_cluster_metadata.instances; All instances should return same value for mysql_server_uuid SELECT @@GTID_EXECUTED; All nodes should return same value Frequently use commands: mysql> SET SQL_LOG_BIN = 0;  mysql> stop group_replication; mysql> set global super_read_only=0; mysql> drop database mysql_innodb_cluster_metadata; mysql> RESET MASTER; mysql> RESET SLAVE ALL; JS > var cluster = dba.getCluster() JS > var cluster = dba.getCluster("<Cluster_name>") JS > var cluster = dba.createCluster('name') JS > cluster.removeInstance('root@<IP_Address>:<Port_No>',{force: true}) JS > cluster.addInstance('root@<IP add>,:<port>') JS > cluster.addInstance('root@ <IP add>,:<port> ') JS > dba.getC...

MySQL slave Error_code: 1032 | MySQL slave drift | HA_ERR_KEY_NOT_FOUND

MySQL slave Error_code: 1032 | MySQL slave drift: With several MySQL, instance with master slave replication, I have one analytics MySQL, environment which is larger in terabytes, compared to other MySQL instances in the environment. Other MySQL instances with terabytes of data are running fine master, slave replication. But this analytics environment get started generating slave Error_code :1032. mysql> show slave status; Near relay log: Error_code: 1032; Can't find record in '<table_name>', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log <name>-bin.000047, end_log_pos 5255306 Near master section: Could not execute Update_rows event on table <db_name>.<table_name>; Can't find record in '<table_name>', Error_code: 1032; Can't find record in '<table_name>', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log <name>-bin.000047, end_l...

InnoDB cluster Remove Instance Force | Add InnoDB instance

InnoDB cluster environment UUID is different on node: To fix it stop group replication, remove instance (use force if require), add instance back Identify the node which is not in sync: Execute following SQL statement on each node and identify the node has different UUID on all nodes. mysql> select * from mysql_innodb_cluster_metadata.instances; Stop group replication: Stop group replication on the node which does not have same UUID on all nodes. mysql > stop GROUP_REPLICATION; Remove instances from cluster: Remove all secondary node from the cluster and add them back if require. $mysqlsh JS >\c root@<IP_Address>:<Port_No> JS > dba.getCluster().status() JS > dba.getCluster () <Cluster:cluster_name> JS > var cluster = dba.getCluster("cluster_name"); JS >  cluster.removeInstance('root@<IP_Address>:<Port_No>'); If you get "Cluster.removeInstance: Timeout reached waiting......" JS > cluster.removeInstance(...