linux-kernel - Re: X86: kexec issues with i915 in 3.14

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1397504974.1059.5.camel@vger.seibold.net>
Date:	Mon, 14 Apr 2014 21:49:34 +0200
From:	Stefani Seibold <stefani@...bold.net>
To:	"Woodhouse, David" <david.woodhouse@...el.com>
Cc:	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"jiang.liu@...ux.intel.com" <jiang.liu@...ux.intel.com>,
	"daniel.vetter@...ll.ch" <daniel.vetter@...ll.ch>,
	"Zanoni, Paulo R" <paulo.r.zanoni@...el.com>,
	"greg@...ah.com" <greg@...ah.com>
Subject: Re: X86: kexec issues with i915 in 3.14

Am Montag, den 14.04.2014, 00:28 +0000 schrieb Woodhouse, David:
> On Sun, 2014-04-13 at 22:01 +0200, Stefani Seibold wrote:
> > Rebooting my kernel vanilla kernel 3.14 will fail with tons of kernel
> > log messages:
> > 
> > [    0.262754] IOMMU: Setting identity map for device 0000:00:1a.0 [0x7c45f000 - 0x7c46bfff]
> > [    0.262780] IOMMU: Setting identity map for device 0000:00:14.0 [0x7c45f000 - 0x7c46bfff]
> > [    0.262798] IOMMU: Prepare 0-16MiB unity mapping for LPC
> > [    0.262807] IOMMU: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff]
> > [    0.262948] PCI-DMA: Intel(R) Virtualization Technology for Directed I/O
> > [    0.262948] dmar: DRHD: handling fault status reg 3
> > [    0.262951] dmar: DMAR:[DMA Write] Request device [00:02.0] fault addr ffffe000 
> > DMAR:[fault reason 05] PTE Write access is not set
> 
> I'm inferring from the subject line that you mean kexec, not
> "rebooting"?
> 

Rebooting via BIOS works, but booting via kexec will result the message
storm or hang kernel with a corrupted display.

> It looks like a peripheral device is being left active and doing DMA by
> the previous kernel, rather than being shut down. So as soon as the new
> kernel resets the IOMMU mappings, that peripheral device is causing
> faults.
> 
> We really ought to rate-limit the faults and isolate the offending
> device before there are 21,000 of them. As discussed elsewhere recently,
> we could do with a way to tell the PCI layer that it offended us but I
> suppose we could at *least* stop the IOMMU from reporting faults for it.
> 
> Is this new behaviour? I'm not sure why this should have changed...
> 

I can reproduce the behaviour also with a 3.13.7 kernel.

One thing i found after the end of the 21.000 messages was a GPU crash:

[    5.002484] r8169 0000:03:00.0 eth0: link up
[    5.002489] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[    6.745051] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... blitter ring idle
[   11.743768] [drm] stuck on render ring
[   11.743773] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[   11.743774] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[   11.743775] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[   11.743777] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[   11.743778] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[   14.240743] systemd-journald[158]: File /var/log/journal/bb613621feef82d686edde0046e9bcea/user-1000.journal corrupted or uncleanly shut down, renaming and replacing.

- Stefani

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/