Mysql replication troublshooting

•

${toc}
to achieve the best result in troubleshooting it is highly recommended to install percona toolkit

it contains tool that will help resolve replication error

Coordinator stopped because there were error(s) in the worker(s).

when we use the command show replica status\G
we found The error message you received indicates that there was a failure in the replication process on one of the worker threads

56cff461021066e42a3f1766a1aa0ee6.png

we can use the below query which will give more information on why we are reciving the error

SELECT * FROM performance_schema.replication_applier_status_by_worker;

7e397ff8ce5eac6cdb9b629c2b780308.png

the error we got because we have rename command on one of the user

in general the error will generate if there is any duplication

in our case the rename on the user was already done on replica,
primary replica will stop replication if it detected any duplication

for this error we can use the below query that will ask mysql to ignore this duplication

STOP replica;
SET GLOBAL sql_slave_skip_counter = 1;
START replica;



16fc9e72edc6e57384adf5911626eb60.png
a1e254988c8447eaede07d4d01b80a4d.png

replica stop due to duplicated data on table

If the master sends an INSERT to the replica and the replica finds that the data is already there, the replication will halt until you fix this issue.

We can simulate this scenario using the below queries. Note that this will happen if the application writes the data on the master and then attempts to write the same data on the replica.


Create Tables on Master and Replica:
-- On Master:
CREATE TABLE repl_innodb(id INT PRIMARY KEY, name1 CHAR(10), name2 CHAR(10)) ENGINE=InnoDB;
CREATE TABLE repl_myisam(id INT PRIMARY KEY, name1 CHAR(10), name2 CHAR(10)) ENGINE=MyISAM;

-- On Slave:
CREATE TABLE repl_innodb(id INT PRIMARY KEY, name1 CHAR(10), name2 CHAR(10)) ENGINE=InnoDB;
CREATE TABLE repl_myisam(id INT PRIMARY KEY, name1 CHAR(10), name2 CHAR(10)) ENGINE=MyISAM;

Insert Data on Slave without Binlog Logging:
-- On Slave:
SET sql_log_bin = 0;
INSERT INTO repl_innodb(id, name1, name2) VALUES(1, 's1062-1', 's1062-1');
INSERT INTO repl_myisam(id, name1, name2) VALUES(1, 's1062-1', 's1062-1');
SET sql_log_bin = 1;

Perform Insert Operations on Master:

-- On Master:
BEGIN;
INSERT INTO repl_innodb(id, name1, name2) VALUES(1, 'm1062-1', 'm1062-1');
COMMIT;

BEGIN;
INSERT INTO repl_innodb(id, name1, name2) VALUES(2, 'm1062-2', 'm1062-2');
COMMIT;

SELECT * FROM repl_innodb;

Simulate Conflict by Inserting Duplicate Data on Master:

-- On Master:
BEGIN;
INSERT INTO repl_innodb(id, name1, name2) VALUES(1, 'm1062-1', 'm1062-1'); -- This will cause a conflict on the replica
COMMIT;

SELECT * FROM repl_innodb;



one we use command show replica status

we will found that replication halted due to error

we can use the below query that will show why replication halted

SELECT * FROM performance_schema.replication_applier_status_by_worker\G;

f70cd7f0e11b2acf56e2bfe0df633ce1.png
b37032e1af96f5004f26d8ca4b77aa72.png

To resolve replication issues caused by inserting the same value on the replica when the master tries to send the value again, you can use the Percona Toolkit, specifically the pt-slave-restart tool. This tool helps automate the process of skipping problematic transactions that cause replication to halt.

using pt-slave-restart

To resolve the replication issue caused by inserting the same value on the replica when the master tries to send the value again, you can use Percona Toolkit’s pt-slave-restart. This tool is part of the Percona Toolkit and is designed to automatically detect and resolve replication errors by skipping problematic transactions.

run the below command on replica
pt-slave-restart -uroot -p

b2a3baa742935762dd9f38b726c4d501.png

one we check the status we can see that replication started again

2051aeb7d4855f505f631636b95bb7dd.png

Check MySQL Replication consistency with Percona tool pt-table-checksum

It is best practice to regularly check replication consistency, especially if you have encountered issues due to problematic transactions. Performing this action at least once a week helps confirm consistency between the master and slave databases.

we will be using one of percona toolkit called pt-table-checksum

for this action we will create user called percona


mysql> CREATE USER 'percona'@'%' IDENTIFIED  BY 'Not_so_secure1!';
Query OK, 0 rows affected (0.01 sec)

mysql> Grant all privileges on *.* to 'percona'@'%';
Query OK, 0 rows affected (0.01 sec)


we will create database with pt-checksum need to use to compare and store result of checksum

mysql> create database percona;
use percona;
mysql> CREATE TABLE `dsns` (
    -> `id` int(11) NOT NULL AUTO_INCREMENT,
    -> `parent_id` int(11) DEFAULT NULL,
    -> `dsn` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
    -> PRIMARY KEY (`id`) ) ENGINE=InnoDB;
	
	show tables;
	
	

the below command that we will use for pt-check


[root@mysql ~]#pt-table-checksum h=10.217.10.7 --user=percona --password='Not_so_secure1!' --port=3306 --no-check-binlog-format --no-check-replication-filters  --replicate=percona.checksu
ms --recursion-method=dsn=D=percona,t=dsns --max-load=Threads_running=50 --max-lag=100

What concerns us in the result is DIFF_ROWS or DIFF . If it is 0, it means that we are okay. If there is a difference, it means we have an inconsistency in one of the tables.
the result will show table and database name for your reference

8ac5dfd8eece58c7b52252e22d9fb78c.png

The command you provided is used to run pt-table-checksum from the Percona Toolkit, which checks for data consistency between MySQL master and slave servers. Here’s a detailed explanation of each part of the command:

pt-table-checksum h=192.168.70.10 --user=percona --password='Not_so_secure1' --port=3306 --no-check-binlog-format --no-check-replication-filters \
    --replicate=percona.checksums --recursion-method=dsn=D=percona,t=dsns --max-load=Threads_running=50 --max-lag=100

Breakdown of the Command:

  1. pt-table-checksum: The main command to run the checksum tool from Percona Toolkit.
  2. h=192.168.70.10: Specifies the host of the MySQL server to connect to. In this case, the IP address is 192.168.70.10.
  3. --user=percona: The MySQL user to use for the connection. Here, it is percona.
  4. --password='Not_so_secure1': The password for the MySQL user. Here, it is Not_so_secure1.
  5. --port=3306: The port number on which the MySQL server is listening. The default MySQL port is 3306.
  6. --no-check-binlog-format: Disables the check for the binary log format. This is useful if you know the binlog format and want to skip this check.
  7. --no-check-replication-filters: Disables the check for replication filters. This is useful if you want to skip this check because you know the replication filters are correctly set up.
  8. --replicate=percona.checksums: Specifies the database and table to store the checksums. Here, the checksums are stored in the percona.checksums table.
  9. --recursion-method=dsn=D=percona,t=dsns: Specifies the method to find other servers in the replication hierarchy.
    • dsn=D=percona,t=dsns: Uses the Data Source Name (DSN) to connect to the database percona and table dsns to find the replication topology.
    • D=percona: The database where the DSN table is located.
    • t=dsns: The table that contains the DSNs for other servers.
  10. --max-load=Threads_running=50: Limits the checksum process based on server load. Here, it stops or slows down if the number of running threads exceeds 50.
  11. --max-lag=100: Limits the checksum process based on replication lag. Here, it stops or slows down if the replication lag exceeds 100 seconds.

sync you replication with percona tool pt-table-sync

If any differences are found when running pt-table-checksum, it means we need to sync the tables using the pt-table-sync command. for testing We will sync two tables called testrepl.repl_innodb and testrepl.repl_myisam.

d80e0b9ba453513420292e483ce78a4e.png
pt-table-sync --user=percona  --password='Not_so_secure1!' h=10.217.10.7 --sync-to-master --replicate percona.checksums --no-foreign-key-checks   --nocheck-child-tables  --tables=repltest .repl_innodb,repltest.repl_myisam --verbose --print


h fill the slave ip
tables fille database.table you want to sync and you can put multable tables

47cd8b0973b48054e2a6f5fb90c5f9ff.png

rerun pt-table-checksum command to verify