${toc}
to achieve the best result in troubleshooting it is highly recommended to install percona toolkit
it contains tool that will help resolve replication error
Coordinator stopped because there were error(s) in the worker(s).
when we use the command show replica status\G
we found The error message you received indicates that there was a failure in the replication process on one of the worker threads
we can use the below query which will give more information on why we are reciving the error
SELECT * FROM performance_schema.replication_applier_status_by_worker;
the error we got because we have rename
command on one of the user
in general the error will generate if there is any duplication
in our case the rename on the user was already done on replica,
primary replica will stop replication if it detected any duplication
for this error we can use the below query that will ask mysql to ignore this duplication
STOP replica;
SET GLOBAL sql_slave_skip_counter = 1;
START replica;
replica stop due to duplicated data on table
If the master sends an INSERT to the replica and the replica finds that the data is already there, the replication will halt until you fix this issue.
We can simulate this scenario using the below queries. Note that this will happen if the application writes the data on the master and then attempts to write the same data on the replica.
Create Tables on Master and Replica:
-- On Master:
CREATE TABLE repl_innodb(id INT PRIMARY KEY, name1 CHAR(10), name2 CHAR(10)) ENGINE=InnoDB;
CREATE TABLE repl_myisam(id INT PRIMARY KEY, name1 CHAR(10), name2 CHAR(10)) ENGINE=MyISAM;
-- On Slave:
CREATE TABLE repl_innodb(id INT PRIMARY KEY, name1 CHAR(10), name2 CHAR(10)) ENGINE=InnoDB;
CREATE TABLE repl_myisam(id INT PRIMARY KEY, name1 CHAR(10), name2 CHAR(10)) ENGINE=MyISAM;
Insert Data on Slave without Binlog Logging:
-- On Slave:
SET sql_log_bin = 0;
INSERT INTO repl_innodb(id, name1, name2) VALUES(1, 's1062-1', 's1062-1');
INSERT INTO repl_myisam(id, name1, name2) VALUES(1, 's1062-1', 's1062-1');
SET sql_log_bin = 1;
Perform Insert Operations on Master:
-- On Master:
BEGIN;
INSERT INTO repl_innodb(id, name1, name2) VALUES(1, 'm1062-1', 'm1062-1');
COMMIT;
BEGIN;
INSERT INTO repl_innodb(id, name1, name2) VALUES(2, 'm1062-2', 'm1062-2');
COMMIT;
SELECT * FROM repl_innodb;
Simulate Conflict by Inserting Duplicate Data on Master:
-- On Master:
BEGIN;
INSERT INTO repl_innodb(id, name1, name2) VALUES(1, 'm1062-1', 'm1062-1'); -- This will cause a conflict on the replica
COMMIT;
SELECT * FROM repl_innodb;
one we use command show replica status
we will found that replication halted due to error
we can use the below query that will show why replication halted
SELECT * FROM performance_schema.replication_applier_status_by_worker\G;
To resolve replication issues caused by inserting the same value on the replica when the master tries to send the value again, you can use the Percona Toolkit, specifically the pt-slave-restart tool
. This tool helps automate the process of skipping problematic transactions that cause replication to halt.
using pt-slave-restart
To resolve the replication issue caused by inserting the same value on the replica when the master tries to send the value again, you can use Percona Toolkit’s pt-slave-restart
. This tool is part of the Percona Toolkit and is designed to automatically detect and resolve replication errors by skipping problematic transactions.
run the below command on replica
pt-slave-restart -uroot -p
one we check the status we can see that replication started again
Check MySQL Replication consistency with Percona tool pt-table-checksum
It is best practice to regularly check replication consistency, especially if you have encountered issues due to problematic transactions. Performing this action at least once a week helps confirm consistency between the master and slave databases.
we will be using one of percona toolkit called pt-table-checksum
for this action we will create user called percona
mysql> CREATE USER 'percona'@'%' IDENTIFIED BY 'Not_so_secure1!';
Query OK, 0 rows affected (0.01 sec)
mysql> Grant all privileges on *.* to 'percona'@'%';
Query OK, 0 rows affected (0.01 sec)
we will create database with pt-checksum need to use to compare and store result of checksum
mysql> create database percona;
use percona;
mysql> CREATE TABLE `dsns` (
-> `id` int(11) NOT NULL AUTO_INCREMENT,
-> `parent_id` int(11) DEFAULT NULL,
-> `dsn` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
-> PRIMARY KEY (`id`) ) ENGINE=InnoDB;
show tables;
the below command that we will use for pt-check
[root@mysql ~]#pt-table-checksum h=10.217.10.7 --user=percona --password='Not_so_secure1!' --port=3306 --no-check-binlog-format --no-check-replication-filters --replicate=percona.checksu
ms --recursion-method=dsn=D=percona,t=dsns --max-load=Threads_running=50 --max-lag=100
What concerns us in the result is DIFF_ROWS or DIFF . If it is 0, it means that we are okay. If there is a difference, it means we have an inconsistency in one of the tables.
the result will show table and database name for your reference
The command you provided is used to run pt-table-checksum
from the Percona Toolkit, which checks for data consistency between MySQL master and slave servers. Here’s a detailed explanation of each part of the command:
pt-table-checksum h=192.168.70.10 --user=percona --password='Not_so_secure1' --port=3306 --no-check-binlog-format --no-check-replication-filters \
--replicate=percona.checksums --recursion-method=dsn=D=percona,t=dsns --max-load=Threads_running=50 --max-lag=100
Breakdown of the Command:
pt-table-checksum
: The main command to run the checksum tool from Percona Toolkit.h=192.168.70.10
: Specifies the host of the MySQL server to connect to. In this case, the IP address is192.168.70.10
.--user=percona
: The MySQL user to use for the connection. Here, it ispercona
.--password='Not_so_secure1'
: The password for the MySQL user. Here, it isNot_so_secure1
.--port=3306
: The port number on which the MySQL server is listening. The default MySQL port is3306
.--no-check-binlog-format
: Disables the check for the binary log format. This is useful if you know the binlog format and want to skip this check.--no-check-replication-filters
: Disables the check for replication filters. This is useful if you want to skip this check because you know the replication filters are correctly set up.--replicate=percona.checksums
: Specifies the database and table to store the checksums. Here, the checksums are stored in thepercona.checksums
table.--recursion-method=dsn=D=percona,t=dsns
: Specifies the method to find other servers in the replication hierarchy.dsn=D=percona,t=dsns
: Uses the Data Source Name (DSN) to connect to the databasepercona
and tabledsns
to find the replication topology.D=percona
: The database where the DSN table is located.t=dsns
: The table that contains the DSNs for other servers.
--max-load=Threads_running=50
: Limits the checksum process based on server load. Here, it stops or slows down if the number of running threads exceeds50
.--max-lag=100
: Limits the checksum process based on replication lag. Here, it stops or slows down if the replication lag exceeds100
seconds.
sync you replication with percona tool pt-table-sync
If any differences are found when running pt-table-checksum
, it means we need to sync the tables using the pt-table-sync
command. for testing We will sync two tables called testrepl.repl_innodb
and testrepl.repl_myisam
.
pt-table-sync --user=percona --password='Not_so_secure1!' h=10.217.10.7 --sync-to-master --replicate percona.checksums --no-foreign-key-checks --nocheck-child-tables --tables=repltest .repl_innodb,repltest.repl_myisam --verbose --print
h
fill the slave ip
tables
fille database
.table
you want to sync and you can put multable tables
rerun pt-table-checksum
command to verify