Using systemd and Logical Volume Manager Snapshot Volumes During Fedora Operating System Upgrades

With the looming end of support for Fedora 22, I thought I might write this up:

I just successfully made use of this solution (based on instructions from the excellent ArchWiki) and I thought I’d provide documentation here since it’s really a fantastic means by which to more confidently engage the frequent large upgrades necessary for someone running Fedora.  It’s the best distribution (in my estimation), since you get access to the latest and greatest as soon as reasonably possible, but that comes with some necessary sysadmin work.  Hopefully, the below solution will ease your nerves a bit!

Brief Theoretical Outline

Solution Overview

Basically, LVM provides the capability to make snapshot volumes based on other logical volumes on your system.  This uses a Copy-On-Write (COW) method to copy data being overwritten in the original logical volume to the snapshot volume, allowing you to effectively maintain a set of data which can be used to reconstruct the original logical volume in its exact state at the point in time which the snapshot volume was created.

So, if we have a mechanism by which we can create snapshot volumes of the critical file systems on our server in consistent states (that’s very important, and I’ll explain the concern and the approach we take to work around it below) prior to our operating system upgrade, we have a means by which we can fall back in the event of a disaster.

Risk Assessment

This covers practically any real risk facing an OS upgrade, though a total destruction of one’s logical volume manager, of course, would render the solution inoperable, so it does not replace standard backup solutions.  Such a disaster would be rare, for even if the OS refuses to boot, one can still make use of rescue media to boot into a separate operating system and, from within it, manipulate the logical volumes in the failed production system in order to merge the snapshot volumes back into the original volumes and, thereby, restore the production system to its original state of operation.

Aside from a failure with the logical volume manager, itself, one could conceivably wind up with a corrupt boot volume (not LVM-managed by default in Fedora), and therefore one would need to be able to use the rescue system to rebuild the boot partition.  Again, this is not a common failure (it has never happened to me), but capturing the data in your /boot partition prior to the upgrade operation with a simple tar file and placing it on separate backup media may be a good idea.  Restoring the boot partition from backup is a lot easier than attempting to remake initramfs files and other data expected by the GRUB configuration on the production system in order to boot it.

Criticality of Separate Backup Solutions

The advantage, however, is that your standard backup solution can remain a simple backup of all of your important data which would, nonetheless, require a potentially lengthy rebuild of your operating system (and you should have instructions for yourself in the event you need to do that).  This is a real advantage since image-based or other whole-system backup solutions are often difficult and time-consuming to maintain in production systems which demand limited downtime (of which I consider my home server to be one; I’ve got big important things to do!).

So make sure your important data is backed up to an independent system or disk which is entirely disconnected from the system we’re working on; that way a failure on the production system cannot adversely impact the integrity of the backup.  You know, standard best practice.  In the worst case scenario, should the LVM-based solution offered here fail, you will be able to at least rebuild the production system from scratch and the backup data you keep.

Considerations for Individual Implementations of this Solution

Once the snapshot volume is created, LVM maintains it for you.  The only consideration you need to have is how large of a volume you would like to dedicate to the snapshot data.  Obviously, you need not create a snapshot volume larger than the original on which it is based (because even if you overwrote every bit of data on the original, you would never need to store more than that on the snapshot volume), but you can typically get away with a significantly less sizable snapshot volume if you don’t expect to actually overwrite all of the data on the original.

So, considering this information in the context of a large system upgrade, I can report that I am making use of only three logical volumes on my server:

  1. One 20GB volume mounted to the root of the file tree ( / )
    1. Only 1.3 GB of data exists here
  2. One 10GB mounted to /home
    1. Only 35 MB of data exists here
  3. One 20GB mounted to /var.
    1. About 7.6 GB of data exists here, but that includes some OS image files for use in guest domains.

I try to keep my system rather minimal (it’s basically a minimal Fedora Server installation with the Virtualization Platform added and little else), so consider that when observing the volume statistics presented above.

To provide adequate snapshot volume sizes for the above volumes, I plan to create a 5 GB snapshot volume for the root volume, a 1 GB snapshot volume for the home volume, and a 5 GB snapshot volume for the var volume.  This covers all of the data in those volumes, so even if we overwrite it all, we should have enough space in the snapshot volumes to cover it.

Instructions

Alright!  If you’ve determined what you need per the information provided above, and you’re ready to go all DevOps sysadmin on your own with the assistance of some guy on the Interwebs and all the guarantees that comes with (i.e. none), let’s commence!

Step 1:  Create the systemd Service Used to Create LVM Snapshots

First, we will configure systemd to create logical volume snapshots prior to the initialization of operating system components which may alter the contents of those volumes.  As noted above, this is to prevent the creation of logical volume snapshots from volumes on which active operating system components could be conducting work, for often the OS buffers data it intends to write to disk in your system’s RAM, but the OS considers the data to have been written and depends on that consideration for its successful operation.  That means that a logical volume snapshot taken of a volume to which buffered data has not yet been written will fail to include that data, so if you were to restore the logical volume from the snapshot data, your OS would expect data to be there which would not be available for restoration, and this would cause corruption that could render your system inoperable.

To address this, we’re going to order our operating system to create our logical volume snapshots prior to mounting the logical volumes during boot.  To do that, we make use of systemd’s awareness of targets, or stages of initialization.  For our purposes here, we are interested in local-fs-pre.target and local-fs.target.  The former target includes systemd actions explicitly scheduled to occur before local file systems are mounted (as described, for example, in /etc/fstab) and the latter includes systemd actions which take place to mount local file systems.

Our service we will author will instruct the OS, via systemd, to run the commands in it after the local-fs-pre.target actions have taken place, but before the local-fs.target actions (e.g. the local file system mounts) have taken place.  Our logical volume snapshots, therefore, will be taken before the original volumes are mounted, thereby ensuring their consistent states at the time of snapshot creation (the OS won’t be buffering data to write to them because they’re not accessible in a standard manner for such activity!).

Our service will be a file in /etc/systemd/system/ (with the other service files), and I have called it LVMsnap.service.  It goes a little something like this:

[Unit]
Description=Used to create LVM snapshots in preparation for an upgrade
Requires=local-fs-pre.target
DefaultDependencies=no
Conflicts=shutdown.target
After=local-fs-pre.target
Before=local-fs.target

[Install]
WantedBy=make-snapshots.target

[Service]
Type=oneshot
ExecStart=/usr/bin/bash -c '/usr/sbin/lvcreate -L5G -n root-`date +%F` -s fedora-server/root'
ExecStart=/usr/bin/bash -c '/usr/sbin/lvcreate -L1G -n home-`date +%F` -s fedora-server/home'
ExecStart=/usr/bin/bash -c '/usr/sbin/lvcreate -L5G -n var-`date +%F` -s fedora-server/var'

As you can see, the service requires that the local-fs-pre.target has been met before it executes, and it must execute before the local-fs.target is executed (so it will occur before local file system mounts).

The service type is a “oneshot”, meaning it simply executes the listed commands and calls it a day (it doesn’t involve any long-running daemon processes or the monitoring that comes with them).  The ExecStart commands might seem a little strange at first; why am I calling bash and executing the lvcreate commands rather than just executing the lvcreate commands directly?  Well, the answer is that systemd service files do not allow for cool stuff like command substitution in the command syntax, so my nifty little date +%F command used to generate a date string to serve as part of the snapshot volume name won’t work if you don’t call bash to interpret the command substitution and handle it for you.

But basically, you can take those lvcreate commands and modify the logical volume paths and sizes to suit your needs (I’m just showing the default strings one might encounter, as relevant to my system description above).  Go for it!

Since the creation and removal of snapshot volumes can happen in real time on your OS without adverse impact, feel free to test your lvcreate commands exactly as written (even with the explicit bash call) in your terminal from within a root context (I recommend sudo).  You can check the condition of the created snapshot volumes by using the handy lvs command.

If everything looks good, simply execute lvremove <volumeGroup>/<snapshotVolume> to get rid of it.  It’ll ask you if you’re sure you want to remove an active volume, and that’s fine; remember, the snapshot volume only holds data copied into it which we might want back at some point, so removing it will have no adverse impact on the running system.

Now the one remaining item that you might wonder about is the make-snapshots.target reference.  What is that?  We make it next!

Step 2:  Create the systemd Target Used to Call the LVMsnap Service

Now we’ll create the target file which we can insert into the series of targets executed by systemd at system startup.  Targets are basically references to sets of services, and systemd executes the targets in the order specified to systemd by the administrator (or OS designer).  The new target file will live in the same location as the service file (/etc/systemd/system/) and I have called it, as you might suspect, make-snapshots.target.  It looks like this:

[Unit]
Description=Invoke the LVMsnap.service to create system snapshots in preparation for an upgrade
Requires=multi-user.target

Very simple.  As you can see, I require the multi-user.target for this particular target to operate, but you can change that to graphical.target (or whatever) if that’s what how you typically boot your system (for, say, Fedora Workstation).

Step 3:  Enable the systemd Service for Use

I guess this could’ve been at the end of Step 1, or maybe it could’ve been Step 2; it doesn’t really matter, but make sure you do this (systemctl enable LVMsnap.service), or you may be looking through your systemd journal, staring at plain evidence that your target was reached, and wondering why it did not execute your service.  Well, if the service is disabled, systemd won’t run it!

Step 4:  Enjoy!

And that is it!  Now, to activate the target, you could either set it as the default before you reboot (systemctl set-default make-snapshots.target) or you could explicitly call the target in GRUB when booting (edit the linux line in your GRUB menu item with Ctrl+e and then add ‘systemd.unit=make-snapshots.target’ to the end).  I personally use the GRUB method, but if you’re managing the system remotely via nothing other than SSH, setting the defautl target temporarily could be an easier way to go.  Just make sure you set the default back to multi-user.target (or whatever) once you boot up, or you’ll be creating snapshot volumes every time you reboot.

I make use of this setup to give myself confidence in upgrading Fedora and now you can, too!  Let me know if you face issues (or see errors or whatever) and I’ll be glad to help.

Advertisements
This entry was posted in Information Technology and tagged , , , . Bookmark the permalink.

2 Responses to Using systemd and Logical Volume Manager Snapshot Volumes During Fedora Operating System Upgrades

  1. fpqc says:

    I’m having a bizarre problem after doing this where my system hangs on reboot as long as there is an active lvm snapshot of the root (it eventually does reboot, but it takes like 15 minutes). This seems to have been reported as a bug a while ago, but nothing seems to have come from it?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s