Bug Report: mutter.3.22.2-1.fc25 Breaks GDM and GNOME for Proprietary NVidia Drivers

MAN.  That took hours to figure out.  I simply ran a dnf upgrade on my Fedora 25 system (using an NVidia GeForce GTX 960 with the proprietary drivers) and, when it rebooted, I was greeted with a failure to start GDM.  The system would simply hang at the text list of services being started.

Switching into another terminal, I found error messages such as these repeated with every effort to start GDM:

Dec 01 15:07:04 hostname gdm[1181]: GLib: g_hash_table_find: assertion ‘version == hash_table->version’ failed
Dec 01 15:06:51 hostname abrt-hook-ccpp[1460]: Process 1455 (gnome-session-failed) of user 42 killed by SIGSEGV – dumping core

Being segmentation faults, I suspected I needed to reinstall my NVidia drivers (which seems to happen once in a while with kernel or mesa updates).  I tried this to no avail.  Many times.

Because I’m a moron and was filtering my journal for error-level messages only, I didn’t notice this helpful message was also being logged:

Dec 01 15:06:51 hostname kernel: gnome-session-f[9433]: segfault at 0 ip 00007f23d0001579 sp 00007ffd1a6248c0 error 4 in libgtk-3.so.0.2200.4[7f23cfd23000+6f0000]

So I started checking around for that same segmentation fault in other users, and lo and behold I hit upon this over at the Arch Linux forums.  If I’m reading the tail end of the related bug report properly, it looks like a bug was fixed upstream, but given that Fedora 25 is still hosting mutter-3.22.2-1.fc25 for download, we may still be waiting on the package which includes the fix.  Or, it may be that the fix included in package 3.22.2-1 is incomplete (a user at the tail end of the forum link reports just a few days ago using the package which includes the apparent fix to no avail).

So anyway, if you, too, are experiencing GDM/GNOME session failures with the proprietary NVidia drivers on Fedora 25, try simply downgrading mutter:

[you@yourplace ~]$ sudo dnf downgrade mutter

That brought me back to mutter-3.22.1-8.fc25.x86_64 and all is well for me.

Now to get rid of lightdm and roll back all my insane troubleshooting installations…

Posted in Information Technology | Tagged , | Leave a comment

Adding a New Boot Disk in Fedora 23 with UEFI

Well THAT was a hell of a learning experience.  You know, I spent all this time learning about GRUB 2, the MBR, and bootloader operation in general, and just when I thought I really had the bootloader stuff down, I realized I had to learn about UEFI before I could consider myself to have mastered the area from a system administration perspective.

But that was a lot of junk to read, so I didn’t.

So then, I found I needed to swap out an SSD in my Fedora system so that I could replace it with a larger drive.  Of course, it had to be my boot volume.

THUS BEGAN MY QUEST.  I read quite a lot.  I can recommend this guy for a high-quality informal read in BIOS and UEFI technology.  But, I think I can make a quick and dirty SysAdmin overview for ya:

  1. As you know, Fedora boots using the GRand Unified Bootloader (GRUB) version 2.
  2. As you may know, GRUB 2 operates either with BIOS or UEFI firmware.
    1. If operating with BIOS firmware, GRUB 2 installs bootloader code at the beginning of a bootable disk, in what’s known as the Master Boot Record.  Even more particularly, it installs to s a very small space at the start of a disk within the MBR which exists prior to partition information (and then a second “Stage 1.5” is installed in a subsquent space).  Many people have experience with bootloaders wiping one another out (such as, say, Microsoft Windows and Fedora) during installation as they contend for this same extremely limited space.
    2. If operating with UEFI firmware, GRUB 2 installs bootloader code in a special EFI partition which must be formatted with the FAT 12, 16, or 32 file system (use mkfs.vfat in Fedora to create any of those file system types).  Technically, the UEFI specification describes a very specific implementation of the FAT file system, but Fedora’s mkfs.vfat command seems to produce file systems of UEFI’s liking.
      1. Once the UEFI bootloader code is in place, there is a final important step for system administrators, and that is to update the UEFI firmware’s boot manager to point to the new code.

That last step there was what had me hung up for about an hour.  Fedora, and GNU/Linux distributions in general, use a tool called efibootmgr to control the UEFI firmware’s boot manager from within the operating system.  That’s pretty sweet.  Amazingly enough, your motherboard is not likely to provide as much capability in managing the UEFI boot manager as this handy tool.  My motherboard doesn’t even seem to let me create boot entries within the UEFI interface, so I have to rely on efibootmgr.

If you check out the man page, you’ll see some pretty standard options.  Basically:

  • Use efibootmgr -v to list the boot manager entries in your UEFI firmware.
  • Use grep efibootmgr /var/log/anaconda/program.log to locate the command used by Fedora when your OS was installed.  It will look something like this:
    • efibootmgr -c -w -L Fedora -d /dev/sde -p 1 -l \EFI\fedora\shim.efi
      • The “-c” option creates a new boot entry
      • The “-w” option writes a signature to the MBR if necessary (which it is not, in a UEFI environment, so this can probably be dropped)
      • The “-L” option creates the name for the boot entry which you will see in your UEFI firmware
      • The “-d” option points to the disk device on which the EFI System Partition (your FAT file system) resides
      • The “-p” option indicates the partition number on the disk device on which the EFI Partition resides
      • The “l” option points to the code within the EFI partition which should be executed first by the UEFI firmware.

Now, armed with this super secret knowledge, you will be able to easily and handily create a new boot device on your Fedora system.  Really, all you have to do is:

  1. Create the necessary EFI and boot partitions on the new device
    1. sudo cfdisk /dev/sdb or whatever and make a 200MB EFI partition and a 500MB boot partition.
  2. Create the necessary file systems in the new partitions
    1. `sudo mkfs.vfat /dev/sdb1`
    2. sudo mkfs.ext4 /dev/sdb2
  3. Make some temporary locations and mount the partitions to them so you can modify their contents
    1. sudo mkdir /mnt/boot2 /mnt/efi2
    2. sudo mount /dev/sdb1 /mnt/efi2 && sudo mount /dev/sdb2 /mnt/boot2
  4. And then just rsync over your current boot and EFI partitions:
    1. sudo rsync -a /boot/ /mnt/boot2
    2. sudo rm -r /mnt/boot2/efi/EFI
    3. sudo rsync -a /boot/efi/ /mnt/efi2
  5. Now just fix your /etc/fstab so that the boot and EFI partitions point to the right new GUIDs
    1. Obtain the file system UUIDs from `cfdisk` (displayed at the bottom)
    2. Swap UUIDs in /etc/fstab
  6. And finally, use efibootmgr to create a new boot entry for your system which points to the proper boot device.
    1. You need only change the -d option in the command from your Anaconda program.log.
    2. You cannot rename UEFI boot manager entries with efibootmgr (sadly), so just delete the old entry after you prove that your system boots with the new entry.

And that is it!  Fantastico.

So whereas with older MBR/BIOS systems, you need to reinstall GRUB after installing Microsoft Windows (if you installed onto the same disk as your Fedora system) in order to overwrite the Windows bootloader in the MBR (and then chainload Windows with GRUB, making GRUB the sole true bootloader for the system), with EFI, you have more options.  You could create two separate EFI partitions for Windows and Fedora, or you could try to put all the bootloader code in a single EFI partition and use efibootmgr to create separate boot entries in your UEFI firmware to point to the same disk and partition, but separate bootloader code for each OS.

It’s actually a lot easier to manage, but it requires this additional understanding to get it right.  Once you know the sequence of events and the relationships between the components, managing issues becomes a lot easier.  If you are no longer able to boot Microsoft Windows on a dual-boot system from within your UEFI firmware, for example, you now know you simply need to boot into your GNU/Linux OS and use efibootmgr to create the appropriate entry.  If your Windows EFI partition was overwritten or the code was lost, you can attempt the use of Microsoft utilities (as described here) to repair that matter.

Posted in Information Technology | Tagged , , , | 1 Comment

Upgrading a KVM/QEMU Windows Guest Domain to Windows 10

You are likely to run into the following error during the upgrade process:

Windows 10 installation failed in SAFE_OS phas with error during boot

The error code reported to you when you reboot the system and it rolls back changes into the old OS is likely:

0xC1900101 – 0x20017

This appears to occur because Windows doesn’t care for the QEMU-provided CPU unless you’re using (as far as I am aware) the core2duo emulation option.

So, simply shut down the guest domain, choose “core2duo” as the CPU type, downgrade your processor count to 2 (if necessary), and retry the operation.  It succeeded for me!  After the upgrade, it seems you can reset the CPU to host (or whatever you use) and increase the processor count without issue.

Additional note:  If you’re using the Red hat VirtIO drivers in Windows 7, Windows 10 will work with those drivers during the upgrade process without an issue.

 

Posted in Information Technology | Tagged , , , | Leave a comment

Extending Storage for an LVM-Backed Windows Guest Domain with KVM/QEMU

Well that was significantly more painful than I had anticipated.  Here’s the quick and dirty instruction set which involves multiple tools, perhaps needlessly, since I was investigating the issue for some time:

  1. Shut down the guest domain (someone might be able to whip up an online resize method, but given the GPT modifications I required, I’m not sure).
  2. Extend the logical volume providing the guest domain’s storage (I’m using raw storage on an LVM) with lvextend as usual.
    1. Example:  lvextend guests/domainVolume -L +40G
  3. Open up the logical volume with gdisk and repair the now-corrupt GPTs (both primary and backup) so that they properly recognize the disk size.
    1. If gpart detects the GPTs as valid, then perform a ‘v’ (to verify the disk).  The gpart utility will inform you that the secondary header’s self-pointer indicates that it doesn’t reside at the end of the disk. You may then use the ‘x’
      option to enter the experts’ menu where you can use the ‘e’ option to relocate the secondary header to the end of the now-extended volume.
  4. Start up the guest domain
    1. Within Windows’ Virtual Disk Manager, I noted that my partition layout was correct (with the system volume extended and everything); this was likely because I had attempted to extend the system partition previously (between steps 2 and 3) and was receiving “Invalid Operation” errors from Windows.  It probably made some headway but failed halfway through, so the partition was properly recognized as extended after I fixed the GPT.
  5. Use Windows diskpart to select the volume targeted for extension (e.g. select volume 3), and extend the filesystem on the volume (extend filesystem).

That is more of a pain than it should be.  Silly Windows VDS.

Posted in Information Technology | Tagged , , , , | Leave a comment

If trump Found the Ring of Gyges..

You know, I actually said to the illustrious philosoraptor a while back, “If that guy found the Ring of Gyges, people would start disappearing.”  And now we have a little insight into just how accurate that assessment seems to have been.

It is really pretty amazing how fervently trump supporters are defending his words as “locker room banter” or what the hell ever.

First, a wee, less important point: I have a hard time thinking most men aren’t garbage. That is my reflexive, perhaps largely emotional position on that general matter. Despite that, and despite having been exposed to my fair share of private interpersonal dialogue between men who were actually garbage, only once or twice did the content of those dialogues resemble trump’s speech. I guess I don’t find it incredibly hard to believe that a large portion of this country is defending his words by asserting abject disbelief in the possibility that other men don’t speak like this in private, given my general inclination towards the belief that most men are garbage, but given that trump’s particularly vile speech is an order of magnitude or so above that which I have typically felt earns men the classification of “garbage,” it is kinda surprising, actually.

The second, most important point, however, seems to be completely eclipsed by the fact that trump was speaking about his penchant for casual sexual assault. The most important point, I think, is that we have this guy, somehow being considered for the office of the President of the United States, on tape saying that he can freely express his penchant for sexual assault because he’s “a star” (loosely construed, apparently), and so he can do anything he wants.

Of course, it should have been an easy inference for anyone to make up until this point that trump is approximately this kind of person. He does not seem to care for anything other than his own aggrandizement and pleasure, and he has repeatedly voiced positions that would have disqualified him from mainstream Republican support were the Republicans not already self-whipped into a near-completely-irrational frenzy over the grossly exaggerated failings of HRC, but now we have actual, direct evidence that he takes whatever advantage conferred upon him by a (much) lower position of social power to do nothing less than sexually assault women for fun.

It is a testament in favor of all those great thinkers of history who thought democracy untenable that the completely irrational means by which so many of our country’s denizens make their decisions should be laid so plainly bare before us all.

HRC is not a great choice. I’ve even written that, were we a stronger nation, she would indeed face jail for her willful negligence in handling our national security. But, sadly, we have a simple choice: it’s her or far, far, unambiguously disastrously far worse. Trump has all the makings of the next Nixon, though without any of Nixon’s good qualities. You can rightfully bemoan the two-party system. You can bemoan the poor choice had in HRC. But in my estimation, the actually unprecedented risk to the nation posed by Trump merits extreme caution and discipline which mandates of us all a vote for HRC. And if she is evaluated objectively, again sadly, it will be found that she is not that far worse than a typical American political candidate.

I don’t like the status quo. I want to see something awesome happen, and I don’t think HRC will bring that about, but I’ll be damned if I fail to cast my vote in a way that does anything other than seek most efficaciously to prevent someone like trump from holding our nation’s highest office.

Posted in Politics | Leave a comment

Linux Kernel 4.7 Memory Allocation Bug (mm/slub.c)

Just a heads up if you’re seeing unpalatable behavior with your Fedora 23 or Fedora 24 rig running any of the 4.7.* kernels:  It looks like Kernel 4.7 may have introduced a memory management bug.  It looks like the bug may be happening as a result of commits to the slub.c code, an example of which can be found here.

I was reticent to blame the kernel since I’m making use of Tianocore firmware in running a fairly sophisticated virtualization platform, but after running known good versions of the firmware and a vast array of combinations of operations, it appears the issue is likely with the kernel.  The fact that two individuals have issues a bug report involving the exact same line of slub.c code which I am seeing referenced in my journal (mm/slub.c:3661) and the fact that this code was updated with Kernel 4.7 gives me some confidence in the diagnosis.

The issue as documented in the bug reports spreads it across two fairly distinct server implementations (NFS and Virtualization) with my issue falling into the latter category.  Both of these services, of course, are tightly integrated with the kernel and it appears that both reports show the crash occurring during kernel memory management operations (kfree and kmalloc syscalls).  With NFS, the sunrpc do_cache_clean process looks to be causing issues across the Interwebs.  With KVM/QEMU, the VMs making use of host-based USB devices likely cause crashes when they are shut down because the kernel is then remapping those devices to memory spaces for the KVM/QEMU host OS.

This is all pretty good diagnostic information, I imagine; a more knowledgeable kernel programmer will likely be able to determine the cause of the issue here far more readily and precisely than I, so I’m hopeful that the issue is addressed with kernel 4.8.

Kernel 4.6.* works without a problem, but, of course, one ought not to run old kernels if it can be avoided at all.  Unfortunately, the order to upgrade from the 4.6.7 kernel has already been given.

 

Posted in Information Technology | Tagged , , , , | Leave a comment

Navigating the Tianocore UEFI Shell

Just in case you end up getting dropped to the UEFI shell when attempting to start a Windows guest domain with KVM/QEMU/Tianocore, I thought I might post a little shot from Alex Williams’ extremely helpful series (to which I’ve already linked in my Windows VM + PCI Passthrough instructions):

I’m not sure why I got dropped in there since the VM has been working without issue for a while, but I’ll update the post if I figure out the cause.

Update:  Looks like it was due to an upgrade for the Tianocore firmware.

Posted in Information Technology | Tagged , , , , , | Leave a comment