磁盘写满导致MySQL复制失败的如何解决方案

我赞美你品格高尚，崇敬你洁白无瑕。我爱你、想你、盼你，像对每一个季节那样。我爱你、想你、盼你，不管世俗的偏见怎样厉害。冬――四季之一的冬，你来吧！我喜欢你纯净的身躯，喜欢你严厉的性格，我要在你的怀抱中锻炼、奋斗、成熟……你可以和春天的万花，夏天的麦浪，秋天的瓜果……比美！

案例场景

今天在线上发现一个问题，由于监控没有覆盖到，某台机器的磁盘被写满了，导致线上MySQL主从复制出现问题。问题如下：

localhost.(none)>showslavestatus\G
***************************1.row***************************
Slave_IO_State:
Master_Host:10.xx.xx.xx
Master_User:replica
Master_Port:5511
Connect_Retry:60
Master_Log_File:
Read_Master_Log_Pos:4
Relay_Log_File:relay-bin.001605
Relay_Log_Pos:9489761
Relay_Master_Log_File:
Slave_IO_Running:No
Slave_SQL_Running:No
Last_Errno:13121
Last_Error:Relaylogreadfailure:Couldnotparserelaylogevententry.
Thepossiblereasonsare:themaster'sbinarylogiscorrupted(youcancheckthisbyrunning
'mysqlbinlog'onthebinarylog),theslave'srelaylogiscorrupted(youcancheckthisby
running'mysqlbinlog'ontherelaylog),anetworkproblem,theserverwasunabletofetcha
keyringkeyrequiredtoopenanencryptedrelaylogfile,orabuginthemaster'sor
slave'sMySQLcode.Ifyouwanttocheckthemaster'sbinarylogorslave'srelaylog,
youwillbeabletoknowtheirnamesbyissuing'SHOWSLAVESTATUS'onthisslave.

于是查看error log，发现error log中的内容如下：

2021-03-31T11:34:39.367173+08:0011[Warning][MY-010897][Repl]StoringMySQLusernameor
passwordinformationinthemasterinforepositoryisnotsecureandisthereforenot
recommended.PleaseconsiderusingtheUSERandPASSWORDconnectionoptionsforSTARTSLAVE;
seethe'STARTSLAVESyntax'intheMySQLManualformoreinformation.

2021-03-31T11:34:39.368161+08:0012[ERROR][MY-010596][Repl]Errorreadingrelaylog
eventforchannel'':binlogtruncatedinthemiddleofevent;consideroutofdiskspace

2021-03-31T11:34:39.368191+08:0012[ERROR][MY-013121][Repl]SlaveSQLforchannel'':Relay
logreadfailure:Couldnotparserelaylogevententry.Thepossiblereasonsare:themaster's
binarylogiscorrupted(youcancheckthisbyrunning'mysqlbinlog'onthebinarylog),the
slave'srelaylogiscorrupted(youcancheckthisbyrunning'mysqlbinlog'ontherelaylog),
anetworkproblem,theserverwasunabletofetchakeyringkeyrequiredtoopenanencrypted
relaylogfile,orabuginthemaster'sorslave'sMySQLcode.Ifyouwanttocheckthe
master'sbinarylogorslave'srelaylog,youwillbeabletoknowtheirnamesbyissuing'SHOW
SLAVESTATUS'onthisslave.Error_code:MY-013121

2021-03-31T11:34:39.368205+08:0012[ERROR][MY-010586][Repl]Errorrunningquery,slaveSQL
threadaborted.Fixtheproblem,andrestarttheslaveSQLthreadwith"SLAVESTART".We
stoppedatlog'mysql-bin.000446'position9489626

从描述中可以看到，error log是比较智能的，发现了磁盘问题，并提示我们需要"consideroutofdiskspace"

解决问题

登录服务器，很快就发现是MySQL所在的服务器磁盘使用率达到100%了，问题原因跟error log中的内容一致。

现在就解决这个问题。基本的思路就是清理磁盘文件，然后重新搭建复制关系，这个过程似乎比较简单，但是实际操作中，在搭建复制关系的时候出现了下面的报错：

###基于gtid的复制，想重新搭建复制关系
localhost.(none)>resetslave;
ERROR1371(HY000):Failedpurgingoldrelaylogs:Failedduringlogreset

localhost.(none)>resetslaveall;
ERROR1371(HY000):Failedpurgingoldrelaylogs:Failedduringlogreset

第一步：因为复制是基于gtid进行的，所以直接记录show slave status的状态后，就可以重新reset slave，并利用change master语句来重建复制关系了。

但是却出现上面的报错，从报错信息看是mysql无法完成purge relay log的操作，这看起来不科学。好吧，既然你自己不能完成purge relay logs的操作，那就让我来帮你吧。

第二步：手工rm -f 删除所有的relay log，发现报错变成了：

localhost.(none)>resetslaveall;
ERROR1374(HY000):I/Oerrorreadinglogindexfile

嗯，好吧，问题没有得到解决。

然后思考了下，既然不能通过手工reset slave 来清理relay log，直接stop

slave 然后change master行不行呢？

第三步：直接stop slave，然后change master，不执行reset slave all的语句，结果如下：

localhost.(none)>changemastertomaster_host='10.13.224.31',
->master_user='replica',
->master_password='eHnNCaQE3ND',
->master_port=5510,
->master_auto_position=1;
ERROR1371(HY000):Failedpurgingoldrelaylogs:Failedduringlogreset

得，问题依旧。

第四步：反正复制已经报错断开了，执行个start slave看看，结果戏剧性的一幕出现了：

localhost.(none)>startslave;
ERROR2006(HY000):MySQLserverhasgoneaway
Noconnection.Tryingtoreconnect...
Connectionid:262
Currentdatabase:***NONE***


QueryOK,0rowsaffected(0.01sec)


localhost.(none)>
[root@~]#

执行start slave之后，实例直接挂了。

到这里，复制彻底断开了，从库实例已经挂了。

第五步：看看实例还能不能重启，尝试重启实例，发现实例还能起来。实例重新起来后，查看复制关系，结果如下：

localhost.(none)>showslavestatus\G
***************************1.row***************************
Slave_IO_State:Queueingmastereventtotherelaylog
Master_Host:10.xx.xx.xx
Master_User:replica
Master_Port:5511
Connect_Retry:60
Master_Log_File:
Read_Master_Log_Pos:4
Relay_Log_File:relay-bin.001605
Relay_Log_Pos:9489761
Relay_Master_Log_File:
Slave_IO_Running:Yes
Slave_SQL_Running:No
Last_Errno:13121
Last_Error:Relaylogreadfailure:Couldnotparserelaylogevententry.
Thepossiblereasonsare:themaster'sbinarylogiscorrupted(youcancheckthisbyrunning
'mysqlbinlog'onthebinarylog),theslave'srelaylogiscorrupted(youcancheckthisby
running'mysqlbinlog'ontherelaylog),anetworkproblem,theserverwasunabletofetcha
keyringkeyrequiredtoopenanencryptedrelaylogfile,orabuginthemaster'sorslave's
MySQLcode.Ifyouwanttocheckthemaster'sbinarylogorslave'srelaylog,youwillbeable
toknowtheirnamesbyissuing'SHOWSLAVESTATUS'onthisslave.
Skip_Counter:0

复制关系依旧报错。

第六步：重新reset slave all看看，结果成功了。

localhost.(none)>stopslave;
QueryOK,0rowsaffected(0.00sec)


localhost.(none)>resetslaveall;
QueryOK,0rowsaffected(0.03sec)

第七步：重新搭建复制关系并启动复制

localhost.(none)>changemastertomaster_host='10.xx.xx.xx',
->master_user='replica',
->master_password='xxxxx',
->master_port=5511,
->master_auto_position=1;
QueryOK,0rowsaffected,2warnings(0.01sec)


localhost.(none)>startslave;
QueryOK,0rowsaffected(0.00sec)


localhost.(none)>showslavestatus\G
***************************1.row***************************
Slave_IO_State:Waitingformastertosendevent
Master_Host:10.xx.xx.xx
Master_User:replica
Master_Port:5511
Connect_Retry:60
...
Slave_IO_Running:Yes
Slave_SQL_Running:Yes

发现实例的复制关系可以建立起来了。

一点总结

当磁盘写满的情况发生之后，mysql服务无法向元信息表中写数据，relay log也可能已经不完整了，如果直接清理了服务器上的磁盘数据，再去重新change master修改主从复制关系，可能会出现报错，不能直接修复，因为这不是一个正常的主从复制关系断裂场景。

所以，正确的做法应该是：

1、清理服务器的磁盘

2、重启复制关系断开的那个从库

3、重新reset slave all、change master来搭建主从复制关系即可

如果有更好的方法，还请不吝赐教。

以上就是磁盘写满导致MySQL复制失败的解决方案的详细内容，更多关于MySQL复制失败的解决方案的资料请关注其它相关文章！

全站频道

大家都在搜索：

案例场景

解决问题

一点总结