MySQL Write Performance and I/O Barriers
Lately I’ve been using Warden to run some pretty large Magento sites not only with Docker Desktop for macOS but also on Docker running natively on Fedora 31 which I’ve got setup on a pretty sweet Dell T5820 I picked up recently. My local network is setup to resolve *.test domains to my Fedora setup when Docker Desktop is stopped allowing me to easily switch between the two, using VSCode Remote SSH + PHP Intelephense by Ben Mewburn as my editor and sometimes experimenting with PHP Storm over X11 in XQuartz. Perhaps I’ll blog about that as well some time, but for now, I’m going to move on to the primary topic of this post: MySQL write optimization on Linux as it pertains to I/O barriers.
The inspiration for this came to me while working on a Magento migration project for a relatively large site as I was testing a full database upgrade process. Now this database is not exactly what we would call “small” as it is 50 GB in size uncompressed. Even if you eliminate all the “extra” like reports, indexes, search tables, integration logs, etc it’s still 36 GB in size. Not super sized, but not too shabby either. For the purpose of the migration tests, I imported and ran with the 50 GB database dump, but for the A/B tests I ran to optimize MySQL on Fedora I was importing the smaller (yet still pretty sizable) 36 GB data set, which compresses down to just under 4.2 GiB in size.
Importing this 4.2 GiB compressed SQL dump into the MariaDB 10.3 container on my Mac mini took only 2 hrs and 20 minutes, but importing into the same MariaDB 10.3 container running natively on Fedora 31 took a whopping 4.5 hrs. So immediately I asked myself, why would importing onto this Fedora machine (which has better specs than my Mac mini does) take so much longer? The Mac mini in question here has an onboard SSD, 32 GB of RAM and 3.2 Ghz clock-speed. Fedora 31 WS is running on a Dell T5820 with an i9-9920X CPU @ 3.50 GHz, 64 GB DDR4, 1 TB SDD M.2 NVMe (Samsung Evo Plus; LUKS + LVM). The biggest difference is on macOS the container is running within a VM and on Fedora the container is running natively and thus the docker volume is stored on the NVMe storage natively rather than in a virtualized storage format as it is on macOS. So why on earth was it slower when natively rather than virtualized? The answer lies in the understanding that a VM using virtualized storage results in a sort of disconnection between the virtualized kernel and the typical qcow2 or raw file the volume is written to on the host machine storage.
Disabling I/O Barriers for a 58% Reduced Import Time
The long in short of this is, I did two things to dramatically improve the time to import large database dumps (and better the I/O performance of MySQL across the board):
- Created an additional logical volume on my primary LUKS + LVM storage pool to provide more space than my boot partition has available to docker, but also so I could tune the mount separately.
- Formated the new volume as
ext4
and mounted it at/var/lib/docker
with thenoatime
andnobarrier
mount options.
The result of this was pulling the database import time down from a whopping 4.5 hrs to just 1 hr and 55 minutes. As you can see, write latency can really add up when doing sequential writes.
Note that the noatime
option really doesn’t affect anything as far as write performance goes, because it essentially disables changing the atime of files on access improving read I/O performance (a micro-optimization by comparison), so I’m going to focus on the nobarrier
option and what it does in the remainder of this post.
Before I dive into what nobarrier
does and why it cuts database import time down so dramatically however, let’s see some numbers.
What My Partitioning Looks Like
The below is what my partitioning looks like. The SSD is encrypted with LUKS encryption, with LVM thin-provisioned partitions inside of the encrypted portion of the disk.
# lsblk -o name,fstype,type,size,mountpoint /dev/nvme0n1
NAME FSTYPE TYPE SIZE MOUNTPOINT
nvme0n1 disk 931.5G
├─nvme0n1p1 vfat part 600M /boot/efi
├─nvme0n1p2 ext4 part 1G /boot
└─nvme0n1p3 crypto_LUKS part 929.9G
└─luks-685cfd50-0737-4d76-bee1-c11e6e2f5825 LVM2_member crypt 929.9G
├─vgfedora-pool00_tmeta lvm 408M
│ └─vgfedora-pool00-tpool lvm 797.8G
│ ├─vgfedora-root ext4 lvm 70G /
│ ├─vgfedora-pool00 lvm 797.8G
│ ├─vgfedora-home ext4 lvm 577.8G /home
│ └─vgfedora-docker ext4 lvm 150G /var/lib/docker
├─vgfedora-pool00_tdata lvm 797.8G
│ └─vgfedora-pool00-tpool lvm 797.8G
│ ├─vgfedora-root ext4 lvm 70G /
│ ├─vgfedora-pool00 lvm 797.8G
│ ├─vgfedora-home ext4 lvm 577.8G /home
│ └─vgfedora-docker ext4 lvm 150G /var/lib/docker
└─vgfedora-swap swap lvm 31.3G [SWAP]
Importing with Default Mount Options
$ mount -l | grep '/var/lib/docker '
/dev/mapper/vgfedora-docker on /var/lib/docker type ext4 (rw,relatime,seclabel,stripe=16,x-systemd.device-timeout=0)
$ ./tools/init.sh --db-dump backfill/magento_prod-inc-trade-20200325.sql.gz
...
==> [11:49:51] Importing database
4.16GiB 4:30:56 [ 268KiB/s] [==================>] 100%
Importing with Custom Mount Options
$ mount -l | grep var/lib/docker\
/dev/mapper/vgfedora-docker on /var/lib/docker type ext4 (rw,noatime,seclabel,nobarrier,stripe=16,x-systemd.device-timeout=0)
$ ./tools/init.sh --db-dump backfill/magento_prod-inc-trade-20200325.sql.gz
...
==> [09:49:10] Importing database
4.16GiB 1:55:16 [ 631KiB/s] [==================>] 100%
The End Result
I realized a whopping 58% reduction in time spent importing this database dump due to setting nobarrier
on the ext4
volume where MariaDB is storing it’s data on my Fedora development machine. This is incredible! If you’re using Linux for MySQL, whether natively or with containers, imagine the amount of time you could save as a developer by creating a separate volume to store that MySQL data with I/O barriers disabled. Remember, the performance benefit here isn’t just importing databases, it will improve the throughput on anything with lots of sequential writes due to the decreased latency during fsync operations.
And as always, an obligatory warning: Disabling barriers when disks cannot guarantee caches are properly written in case of power failure can lead to severe file system corruption and data loss. As I noted above, since I have a separate volume and a UPS (plus it’s a development machine) I don’t care too much about the risks here. It’s worth the tradeoffs for my use case. Don’t do this in production (unless you know with certainty that the write controllers in use will properly preserve data in the event of power loss)
Nitty Gritty Details About The nobarrier
Option
So what is this nobarrier
(or barrier=0
) mount option? Here is what the man page has to say:
barrier=0 / barrier=1
This enables/disables barriers. barrier=0 disables it, barrier=1 enables it. Write barriers enforce proper on-disk ordering of journal commits, making volatile disk write caches safe to use, at some performance penalty. The ext3 filesystem enables write barriers by default. Be sure to enable barriers unless your disks are battery-backed one way or another. Otherwise you risk filesystem corruption in case of power failure (source)
In simple terms, a barrier is a check that occurs in the kernel and works together with journaled file systems to ensure data corruption will not result from sudden power loss. This I/O “barrier” (as will any single point through which data must flow) will bottleneck write I/O and slow it down. This same principle is part of why the NVMe protocol will perform WAY better than the older AHCI protocol: there are 64k command queues vs 1 command queue, and the media communicates directly with the CPUs rather than requiring the “middleman” communication with a SATA controller.
How much of a bottleneck I/O barriers will pose naturally will differ based on workload, storage medium, etc. Red Hat reports the performance impact of barriers as approximately 3% as of RHEL 6 and recommends not disabling them any more, but as we have seen this is not always the case. My guess (and it is a guess) is that this 3% mark RH mentions is more tied to benchmarks done with magnetic storage and more typical non-database write workloads (a web server for example is going to be impacted very little by this change since in most cases, it won’t be write heavy).
Further Reading Material
If you want to go deeper on the topic of I/O barriers and how they work, there is some good information available from Tejun Heo.
And lastly, there is a really good writeup on the PerconaDB site covering mount options like the above and more (such as queue scheduling; noop, deadline, cqp, etc) for tuning SQL performance across XFS, ext4 and ZFS. The post also touches on a few things around swappiness and NUMA interleaving. Good stuff.