I Might Be Wrong

MySQL relay-log-space-limit vs. your page cache

Posted in MySQL by Leif Ryge on September 10, 2010

MySQL slaves have two replication threads, the IO_THREAD and the SQL_THREAD. The first continuously fetches binary logs from the master and writes them as relay logs, creating a new file every time the current one reaches max-relay-log-size. The second continuously reads those relay logs, applies them to the database, and deletes each file when it is finished with it. So, when everything is running normally, and the SQL thread is keeping up with the master, there will not be more than two relay log files.

There are numerous reasons why  MySQL replication might lag briefly and then recover. The situation which caused me to write this post was due to a spike in writes on a database that was already close to being disk-bound under normal conditions. (Even though the master and slaves have identical hardware, and the slaves get no traffic other than replication because they’re intended only for high availability, the slaves hit the IOPS bottleneck first due to having only one SQL thread.)

When the SQL thread is not able to keep up, logs will accumulate. You can set the location of your relay logs with the relay-log option; in the default configuration (and many production configurations), the relay logs are stored on the same disk as the rest of MySQL’s data. If the total size of the relay logs ever exceeds relay-log-space-limit, the IO thread will pause until the SQL thread has moved on to its next file and deleted its previous one.

Unfortunately, relay-log-space-limit defaults to zero (unlimited). This means that when the SQL thread can’t keep up, the slave’s relay logs will continue to accumulate indefinitely (until its disk fills up). When the total size of those logs approaches the amount of vfs page cache available, the SQL thread will begin to read them from disk (instead of getting cache hits), which will exacerbate the IO shortage that caused the problem in the first place.

Setting relay-log-space-limit will cause the slave’s IO thread to pause and resume as needed, so that the slave is always reading relay logs from its page cache. You should probably do that.

This will have a side effect of making lagged slaves cause cache misses on the master instead, because the IO thread will now be reading older binary logs which have already fallen out of cache. But, depending on your application, this might be just what you want: if the extra disk contention from reading those logs on the master slows down the application’s writes, it can effectively throttle everything until the slave catches up.

If you’re like I was a few weeks ago, you’re learning this while you’ve got a lagged slave that is thrashing its disk needlessly reading relay logs, and you face a dilemma: changing relay-log-space-limit requires restarting MySQL. If you have a large innodb_buffer_pool_size that you need to keep warmed up, restarting can take a while… during which time the replication problem will continue to worsen. But, if you do nothing, the extra IO of the SQL thread reading its relay logs may prevent it from catching up for even longer. Depending on how short on IOPS you are, once you start reading relay logs from disk instead of page cache, you may be unable to catch up at all!

So, if you want to have the effect of relay-log-space-limit, but you don’t want to restart MySQL to get it, this hack I call “pseudo-relay-log-space-limit.sh” might be just what you need:

# this is a stupid hack for situations where you have a lagged slave
# without relay-log-space-limit set which you want to be able to catch up, 
# without needing to restart and rewarm it.
# note: Seconds_Behind_Master is NULL while the IO thread is stopped.
while true; do
cur=$(echo 'show slave status\G'|mysql|grep Space|tee /dev/stderr|egrep -o '[^ ]+$')
if [ $cur -gt $max ]
then echo STOP SLAVE IO_THREAD  |tee /dev/stderr| mysql
else echo START SLAVE IO_THREAD |tee /dev/stderr| mysql
sleep 10