I’ve suffered a home file server with an overly complicated disk storage setup for a number of years: LUKS, MD RAID, and LVM with a caching setup on top of that. It’s mostly a toy used for non-critical backups, so it was fun to tinker with. However, I’ve suffered disk losses multiple times despite using supposedly decent quality NAS drives.
The latest configuration was 3x3TB RAID5 and an 80GB SSD LVM write-back cache with XFS. One disk suddenly had 28k bad sectors and that’s when I noticed a second disk had a few hundred. It’s used with Back In Time, so there are literally millions of tiny files. I decided I’d do a major cleanup of the backups, and that’s when I found corruption: attempts to delete certain snapshots would actually lock the entire system with “CPU Stuck” errors and an XFS stack.
I had a lot of experience with ZFS on Solaris a lifetime ago. I had a critical production file server suffer multiple simultaneous hardware failures that ZFS magically resilvered itself out of. I even ran OpenSolaris at home until Larry destroyed it. I know ZFS on Linux has come a long way, and I wondered if my experience would be positive. I wanted a rock solid file system that could recover from errors. I didn’t really want to deal with separate volume management, raid, and encryption layers. ZFS has always had volume management and RAID, and long ago added encryption. ZFS on Linux suffers licenses conflicts that requires it to be a separate DKMS-based install. Is it a PITA?
The ZFS on Linux Experience
# dnf install -y https://zfsonlinux.org/fedora/zfs-release$(rpm -E %dist).noarch.rpm # dnf install -y zfs # zfs --version zfs-2.1.2-1 zfs-kmod-2.1.2-1
Well, that was hard. 🙂
Before creating my zpools and file systems, I knew I needed to figure out encryption. Unlike LUKS, there’s no support for prompting for a passphrase on boot and unlocking all applicable encrypted volumes. I did not want to have a zpool with multiple file systems and have to manually type a passphrase multiple times. I also really didn’t want a bash script iterating through them.
I realized (remembered?) I could use the -O option on zpool create to enable encryption on the root file system and then every file system would inherit it. There’s a downside in that you cannot disable encryption on child file systems, but I can’t think of a reason I wouldn’t want everything encrypted.
# zpool create disk raidz \ -O encryption=on \ -O keylocation=prompt \ -O keyformat=passphrase \ /dev/disk/by-id/ata-WDC_WD40EFZX-68AWUN0_WD-WX22DA1BA73D \ /dev/disk/by-id/ata-WDC_WD40EFZX-68AWUN0_WD-WX22DA13XER7 \ /dev/disk/by-id/ata-WDC_WD40EFZX-68AWUN0_WD-WX32D61FFEPY Enter new passphrase: Re-enter new passphrase: # zpool list NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT disk 10.9T 1.26M 10.9T - - 0% 0% 1.00x ONLINE -
A quick reboot confirmed I wasn’t prompted for a password anywhere. zfs load-key disk confirmed I could mount all the file systems with one passphrase entry.
I created a systemd service using systemd-ask-password to prompt for the passphrase and pass it to zfs load-key. This could be much more generic – if you use it, you’ll want to change the file system name at a minimum.
[Unit] Description=Load ZFS encryption keys DefaultDependencies=no Before=systemd-user-sessions.service Before=zfs-mount.service After=zfs-import.target After=plymouth-start.service systemd-vconsole-setup.service [Service] Type=oneshot RemainAfterExit=yes ExecStart=/bin/sh -c '/usr/bin/systemd-ask-password "Enter passphrase for ZFS:" | zfs load-key disk' [Install] WantedBy=zfs-mount.service
Drop this in /etc/systemd/system/zfs-load-key.service and don’t forget to enable it. You can also start it and test it without rebooting:
# systemctl enable zfs-load-key # systemctl start zfs-load-key
An actual reboot confirmed I was prompted for the passphrase right after the LUKS passphrase for my root volume. Success!
The original data (at least, what was retrievable) was backed up on a single external disk with rsync. I began to rsync from there onto the new ZFS file systems. On my second rsync to double check a file system with a large number of small files, I noticed it was quite slow and that the zpool showed a lot of write activity despite rsync not showing any:
# zpool iostat 5 capacity operations bandwidth pool alloc free read write read write ---------- ----- ----- ----- ----- ----- ----- disk 20.6G 10.9T 225 0 1.02M 0 disk 20.6G 10.9T 905 0 4.12M 0 disk 20.6G 10.9T 831 0 3.81M 0 disk 20.6G 10.9T 817 0 3.74M 0 disk 20.6G 10.9T 712 0 3.26M 0 disk 20.6G 10.9T 671 0 3.06M 0 disk 20.6G 10.9T 606 0 2.76M 0 disk 20.6G 10.9T 582 0 2.66M 0 disk 20.6G 10.9T 812 84 3.47M 1000K disk 20.6G 10.9T 690 115 2.94M 1.23M disk 20.6G 10.9T 759 65 3.22M 1.50M disk 20.6G 10.9T 912 109 3.87M 1.67M disk 20.6G 10.9T 728 46 3.10M 1.49M disk 20.6G 10.9T 641 75 2.77M 1.51M disk 20.6G 10.9T 729 93 3.18M 1.63M disk 20.6G 10.9T 899 226 4.09M 2.88M disk 20.7G 10.9T 883 237 4.00M 2.66M disk 20.7G 10.9T 844 243 3.81M 2.57M disk 20.7G 10.9T 1.03K 99 4.41M 918K disk 20.6G 10.9T 1.12K 49 4.81M 768K disk 20.6G 10.9T 1K 28 4.03M 194K disk 20.6G 10.9T 684 78 2.67M 812K disk 20.6G 10.9T 1.13K 43 4.54M 904K disk 20.6G 10.9T 736 99 2.88M 899K disk 20.6G 10.9T 635 82 2.48M 745K disk 20.6G 10.9T 271 43 1.06M 827K
This immediately caused me to recall an important ZFS tuning tip from my distant past:
zfs set atime=off disk
The writes stopped immediately. 🙂
I contemplated adding my former LVM cache SSD back. The original intent was to offload reads from slow NAS drives onto an old spare SSD. I’m not sure it was ever terribly effective. ZFS’s ARC is more intelligent than most LRU disk caches, so it may improve day to day performance without an SSD. There’s also a lot of RAM overhead in managing a ZFS L2ARC disk, and this is an older server with only 8GB. For now, we’ll see how it goes without an L2ARC.
I’m now enjoying a simpler setup, simpler management, and far better protection of my data than XFS offered. Despite not being part of the Linux kernel or Fedora, it was trivial to install ZFS and get it running. The only inconvenience I found was the need to add a systemd script to prompt for passphrases. So far, I’m quite happy with the migration from LUKS+MD+LVM+XFS to ZFS.