Recently Intel started including their graphics drivers into the mainline Linux kernel. This is great except when it stops working. Having suffered intermittent, sporadic GPU freezes on my Lenovo x270 (Kabylake) work laptop since Kernel 4.12+ I came across a bug report that seemed related. Here’s my temporary fix on Fedora 28 for getting things stable again until it’s fixed for good upstream.
The Intel Integrated Graphics Crash
I started having complete system freezes intermittently where my laptop display would shudder / jitter and then hard lock. There was no real pattern to this happening nor was I able to get any log file information or journal information about the issue as it completely froze. No network, no ping, nothing.
VGA compatible controller: Intel Corporation HD Graphics 620 (rev 02)
I’ve had this issue on both Skylake and Kabylake Intel-based laptop systems with integrated HD graphics (i915 driver) across Kernels 4.12+ through the 4.16.8 Fedora 28 Kernel. I recall this also happened on my previous Lenovo x240 though less frequently.
I am sure at some point this will be permanently fixed and disseminate down to all the major Linux distributions. I was impatient and frustrated and wanted something to work right away, here’s how I got there.
Update: 2018-05-28: It seems that as of Skylake Fedora no longer uses the intel/i915 driver by default if you’re using Xorg. It instead uses the xorg-x11-drv-intel driver which means the testing below is not relevant currently (thanks to Venemo in the comments).
I doing more testing now against the stock 4.16.10 kernel using the Intel driver in lieu of the xorg-x11-drv-intel driver.
To switch to the intel driver make yourself an xorg shim config and then restart Xorg or reboot.
cat > /etc/X11/xorg.conf.d/10-intel.conf <<EOF Section "Device" Identifier "Intel Graphics" Driver "intel" EndSection EOF
If you were running the xorg Intel drivers you’d have seen something like this in /var/log/Xorg.0.log:
[ 18.547] X.Org Video Driver: 23.0 [ 18.547] X.Org XInput driver : 24.1 [ 18.547] X.Org Server Extension : 10.0
If you are now running the Intel i915 driver you’d see this instead:
[ 17.369] (II) intel: Driver for Intel(R) Integrated Graphics Chipsets: i810, i810-dc100, i810e, i815, i830M, 845G, 854, 852GM/855GM, 865G, 915G, E7221 (i915), 915GM, 945G, 945GM, 945GME, Pineview GM, Pineview G, 965G, G35, 965Q, 946GZ, 965GM, 965GME/GLE, G33, Q35, Q33, GM45, 4 Series, G45/G43, Q45/Q43, G41, B43 [ 17.369] (II) intel: Driver for Intel(R) HD Graphics [ 17.369] (II) intel: Driver for Intel(R) Iris(TM) Graphics [ 17.369] (II) intel: Driver for Intel(R) Iris(TM) Pro Graphics [ 17.370] (II) intel(0): Using Kernel Mode Setting driver: i915, version 1.6.0 20171222 [ 17.372] (--) intel(0): Integrated Graphics Chipset: Intel(R) HD Graphics 620
Update: 2018-05-29: I’m still getting GPU freezes with the 4.16.11 Fedora 28 kernel and the Intel i915 driver.
I have also tried the following kernel parameter which doesn’t help:
I am now testing the latest drm-tip against the Fedora Rawhide 4.17.0-rc kernels as I have time to hopefully see if/when a fix appears.
Update: 2018-06-04: So far things have been stable for 4 days on 4.17.0-rc7 and latest Intel drm-tip kernel tree modules copied in, I will keep this updated if I have another freeze.
Commenter Venemo has stated he’s still getting this freeze on 4.17.0-rc7 however on another system.
Update: 2018-06-05: I experienced another GPU freeze but it took about ~5 days of normal usage and dozens of suspend/resumes. For giggles I tested trying to trigger this in the BIOS and was able to get screen artifacts by jerking the laptop around and also by squeezing (a normal amount) the palm rest area of the laptop.
I also tried a Windows 10 USB stateless image and indeed I get the same GPU freezes there too. This makes be believe it’s a hardware defect. I’ve filed a ticket with Lenovo and I’ll be mailing my laptop in for repair/replacement – I’ll let ya know how it goes.
Update: 2018-07-17: I received my laptop back from repair and the motherboard and SATA cables were replaced. I’ve had zero issues for almost a month now on 4.17.7+ kernel on Fedora28. Here’s the work receipt from Lenovo (replaced assemblies PCB, motherboard, cables, wire):
I am leaving the rest of this blog post / guide up in case it might be useful for someone tracing down Intel GPU issues on Linux or filing bugs against upstream.
Testing Intel Upstream Linux Kernel Drivers
Below is how I previously tested the latest 4.17.0-rc kernel and Intel drm-tip kernel modules which may still be useful to others so I’m leaving it here.
Temporary Fix for the Intel Graphics Crash
The fix I found was to use the absolute latest Intel drm-tip git kernel code combined with a 4.17.0-rc5 Kernel build. I then installed a Fedora development (rawhide kernel) and later copied in the compiled Intel kernel modules in afterwards, Yolo.
The full docs for setting up the latest Intel stack is here but I’m going to explain just the basics in case you are hitting this as well and want to get up and running quickly.
Build Kernel (Modules) Against drm-tip
This is going to clone a rather large git repository of all the upstream intel drm bits and build the latest kernel and modules. Note that we’re ommitting the actual make install of the kernel, we only care about the modules. You’re going to need this later.
First you’ll need some build and compiler tools, this is what I needed to install prior:
sudo dnf install openssl-devel automake gcc elfutils-libelf-devel zlib-devel flex bison
Next build the thing against the latest upstream Intel drm-tip repository. This may take quite some time. Ironically the GPU froze on me a few times trying to build the latest drivers that should supposedly contain the crash fix! Maybe it could sense it.
export MY_DISTRO_PREFIX=/usr export MY_DISTRO_LIBDIR=/usr/lib64 git clone git://anongit.freedesktop.org/drm-tip cd drm-tip make defconfig sed -i 's/CONFIG_DRM_I915=y/CONFIG_DRM_I915=m/g' .config sed -i 's/CONFIG_DRM=y/CONFIG_DRM=m/g' .config sed -i 's/CONFIG_DRM_MIPI_DSI=y/CONFIG_DRM_MIPI_DSI=m/g' .config sed -i 's/CONFIG_DRM_KMS_HELPER=y/CONFIG_DRM_KMS_HELPER=m/g' .config sed -i 's/CONFIG_DRM_KMS_FB_HELPER=y/CONFIG_DRM_KMS_FB_HELPER=m/g' .config sed -i 's/CONFIG_DRM_FBDEV_EMULATION=y/CONFIG_DRM_FBDEV_EMULATION=m/g' .config sed -i 's/CONFIG_DRM_I915_CAPTURE_ERROR=y/CONFIG_DRM_I915_CAPTURE_ERROR=m/g' .config sed -i 's/CONFIG_DRM_I915_COMPRESS_ERROR=y/CONFIG_DRM_I915_COMPRESS_ERROR=m/'g .config sed -i 's/CONFIG_DRM_I915_USERPTR=y/CONFIG_DRM_I915_USERPTR=m/g' .config sed -i 's/CONFIG_DRM_PANEL=y/CONFIG_DRM_PANEL=m/g' .config sed -i 's/CONFIG_DRM_BRIDGE=y/CONFIG_DRM_BRIDGE=m .config sed -i 's/CONFIG_DRM_PANEL_BRIDGE=y/CONFIG_DRM_PANEL_BRIDGE=m/g' .config sed -i 's/CONFIG_DRM_PANEL_ORIENTATION_QUIRKS=y/CONFIG_DRM_PANEL_ORIENTATION_QUIRKS=m/g' .config make sudo make modules_install
The last line will copy the kernel modules you create into /lib/modules/KERNEL_VERSION/kernel/drivers/gpu/drm along with a bunch of other kernel drivers we’re not going to need.
If you get a compilation issue about asm-goto support you’ll need to comment that out of arch/x86/Makefile and try again:
183 #ifndef CC_HAVE_ASM_GOTO 184 # $(error Compiler lacks asm-goto support.) 185 #endif
Install Rawhide Development Kernel
While you could just run the drm-tip Kernel chances are you’d need a whole lot more modules configured/enabled for your hardware. I find it is much easier to just use your distributions latest kernel (if it matches latest upstream) as those are generally better configured for most hardware use cases and you’ll have everything reasonable provided as a loadable module.
You might substitute Rawhide here for your distributions development / bleeding edge Kernel like Tumbleweed for SuSE. For Fedora users I am providing the direct paths here, which may change so double check the parent location.
cd /tmp/ wget http://download-ib01.fedoraproject.org/pub/fedora/linux/development/rawhide/Workstation/x86_64/os/Packages/k/kernel-4.17.0-0.rc5.git1.1.fc29.x86_64.rpm wget http://download-ib01.fedoraproject.org/pub/fedora/linux/development/rawhide/Workstation/x86_64/os/Packages/k/kernel-core-4.17.0-0.rc5.git1.1.fc29.x86_64.rpm wget http://download-ib01.fedoraproject.org/pub/fedora/linux/development/rawhide/Workstation/x86_64/os/Packages/k/kernel-devel-4.17.0-0.rc5.git1.1.fc29.x86_64.rpm wget http://download-ib01.fedoraproject.org/pub/fedora/linux/development/rawhide/Workstation/x86_64/os/Packages/k/kernel-headers-4.17.0-0.rc5.git1.1.fc29.x86_64.rpm wget http://download-ib01.fedoraproject.org/pub/fedora/linux/development/rawhide/Workstation/x86_64/os/Packages/k/kernel-modules-4.17.0-0.rc5.git1.1.fc29.x86_64.rpm wget http://download-ib01.fedoraproject.org/pub/fedora/linux/development/rawhide/Workstation/x86_64/os/Packages/k/kernel-modules-extra-4.17.0-0.rc5.git1.1.fc29.x86_64.rpm
Install the Development 4.17+ Kernel
Now install the Rawhide development kernel and associated packages.
cd /tmp/ sudo dnf localinstall kernel-*.rpm
Copy the drm-tip Kernel GPU Modules Over
The current Rawhide kernel does not have the latest version of the Intel drivers that we’ll need that contain the actual fix so we’re going to copy them in manually. This is fairly bad practice but in general but we don’t really care – we’d prefer something working to good etiquette that doesn’t.
Your paths and names may vary, but I copied over the entirety of the /lib/modules/KERNEL_MODULES_YOU_BUILT/kernel/drivers/gpu/drm/* into the modules location of the Rawhide kernel that I just installed.
sudo cp -Rv /lib/modules/4.17.0-rc5+/kernel/drivers/gpu/drm/* /lib/modules/4.17.0-0.rc5.git1.1.fc29.x86_64/kernel/drivers/gpu/drm/
Again, not the most elegant fix but gets the job done. At this point just reboot into the new kernel and if you had crashes before due to the bug I was hitting hopefully they have gone away.
Usage and Testing
After ~20 hours of GPU torture tests (hundreds of glxgears spinning, open/close images in a shell loop, suspend and resume constantly over and over, old games raging in wine) things seem pretty stable. Before I’d get hard GPU lockups anywhere from 6 minutes to 6 hours into normal desktop usage.
I realize this is a rather temporary blog post and I’m positive that all this will get fixed in upstream kernels. For now it was important (and frustrating enough) to find a fix as soon as possible and then write about it. I hope this helps someone else.
How to Debug an Intel GPU Crash
If you’re lucky enough to get logs or data written in case of a GPU crash there’s an easy way to gather debug information to file an Intel graphics driver bug.
sudo mount -tdebugfs debug /sys/kernel/debug sudo cat /sys/kernel/debug/dri/0/i915_error_state > i915_error_state