lizard kernel oops on 4.9.0-8 kernel

Booting lizard on a 4.9.0-8 kernel resulted in a non-functional system spewing out the following oops:

Aug 22 08:27:57 lizard kernel: [ 387.963350] Oops: 0000 [#1] SMP

Aug 22 08:27:57 lizard kernel: [ 388.244093] Call Trace:
Aug 22 08:27:57 lizard kernel: [ 388.246539] [] ? do_huge_pmd_numa_page+0xa6/0x5c0
Aug 22 08:27:57 lizard kernel: [ 388.252884] [] ? handle_mm_fault+0x676/0x12b0
Aug 22 08:27:57 lizard kernel: [ 388.258888] [] ? __do_page_fault+0x255/0x4f0
Aug 22 08:27:57 lizard kernel: [ 388.264799] [] ? page_fault+0x28/0x30

Some suggestions on how to proceed:

- perhaps we can reboot on the 4.9.0-8 kernel but disable the l1tf fixes (iirc one can enable/disable parts of it selectively), they’re the only change in –8 so likely the cause of the trouble

- reboot on the 4.9.0-8 kernel, disabling libvirtd and numad on the kernel cmdline (iirc systemd has means to disable service startup this way), log in, start numad, make sure it’s really really up and ready (Type=forking does not really guarantee the daemon is ready to answer requests), disable autostarting of all VMs, start libvirtd, start the biggest (RAM-wise) VMs one after the other, check numa allocation, then start everything else if no trouble. maybe that would help diagnose what’s going on wrt numa.

- reboot on a much newer kernel, in the hope that the problem is the backport of this big pile of fixes to 4.9

Related issues

Related to #11179 (closed)
Blocks #13242

Original created by @groente on 15832 (Redmine)

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information