Possible Bug: lvm2-2.02.175-1.fc27

So I just updated this package and rebooted my system.  I encountered an interesting issue where one of my guest domain refused to start.  Initial investigation of the system journal revealed messages like:

systemd-udevd[775]:  seq 3711 '/devices/virtual/block/dm-21' is taking a long time
systemd-udevd[775]:  seq 3710 '/devices/virtual/block/dm-23' is taking a long time

I use logical volumes for some of my guest domains’ storage.  The guest domain refusing to start is one of them.  Looking into the matter, I found that a process from the system startup routines was hung:

usr/bin/lvm pvscan --cache --activate ay 8:12

When I attached strace to the process (strace -p <PID>), I saw that it was stuck with a semaphore operation like:

semop(262144, [{0, 0, 0}], 1

If I attempted any lvm commands which required obtaining locks on the logical volumes and volume groups (e.g. vgscan, lvscan, etc.), they would hang indefinitely.  Interrupting them would yield messages such as:

“Giving up waiting on lock”

So, I checked into /run/lock/lvm/ and found four outstanding lock files.  Two of them referenced the UUID of the unresponsive guest domain logical volume (which had since been suspended by LVM due to its obstinate behavior), one was a global lock for the physical volume hosting the logical volume, and one was a global lock for the volume group.

After a bit of research, and being fairly convinced that the pvscan operation was merely awaiting the removal of these lock files (whose removal had failed somewhere prior, and that is possibly the fault of a hypothetical bug) and that it would not muck up my system with a firm SIGKILL (SIGTERM would not work), I executed kill -9 against the process, manually removed the empty lock files (all of them), and rebooted the system.

The shutdown portion of the reboot did not go smoothly (the other lvm2-pvscan@whatever.service processes all required SIGKILL from systemd during the shutdown), but when the system came back up, it was error free and the guest domain started without hesitation.

It may be that this bug involves the handling of logical volumes with snapshots (as this was the only logical volume on my system which had an active snapshot during the lvm2 package upgrade procedure), or it may be something else, but seeing no reference to similar incidents on the Interwebs, I thought I’d put it up here in case anyone goes looking for others facing the same issue.  If that happens, perhaps we can file a bug report.

This entry was posted in Information Technology and tagged , , , . Bookmark the permalink.

Leave a comment