linux-kernel - Re: drm/mgag200: doesn't work in panic context

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <1435735563-5820-1-git-send-email-rui.y.wang@intel.com>
Date:	Wed,  1 Jul 2015 15:26:03 +0800
From:	Rui Wang <rui.y.wang@...el.com>
To:	daniel.vetter@...ll.ch
Cc:	bp@...en8.de, tony.luck@...el.com, airlied@...hat.com,
	robdclark@...il.com, matthew.d.roper@...el.com,
	gong.chen@...el.com, linux-kernel@...r.kernel.org,
	Rui Wang <rui.y.wang@...el.com>
Subject: Re: drm/mgag200: doesn't work in panic context

On Tuesday, June 30, 2015 11:24 PM, Daniel Vetter <daniel.vetter@...ll.ch> wrote:
> On Tue, Jun 30, 2015 at 9:23 AM, Rui Wang <rui.y.wang@...el.com> wrote:
> > But einj does something more than what an IPI can do, it injects hardware
> > errors which trigger exceptions in NMI context... and the exception handler
> > usually panics on fatal errors. And the display may be the only way to catch
> > what has happened. I'm just hoping that the future version may work in
> > NMI context.
> 
> NMI sounds ... ambigous ;-) But yeah if we can somehow inject
> something as an NMI too then that would be even better. What I want to
> avoid is forcing reboots, since that means you can't run a basic
> modeset test afterwards to make sure nothing was trampled too badly.
> Of course we'd have replace the screen contents, but the important
> part is that the panic handler doen't touch anything if the driver is
> in modeset code right now (because it'll massively increase the risk
> of dying completely), and an easy way to check that it didn't step all
> over modeset state unduly is to do a modeset afterwards. If that works
> we'll be fine.
> 
> Also with that approach we can make sure that no real errors get into
> dmesg (as opposed to a real panic), which means we can capture dmesg
> afterwards and if there is a seroius log message (or even backtrace)
> then drm panic handling has a bug.
> 
> All that isn't possible when we force a real panic to happen.
> 
> Actually thinking more about NMI that shouldn't be a problem. The
> important thing with nmi vs. hardirq is that you can't even reliably
> grab an irqsave spinlock, it's trylocks all the way down. But that
> also holds for the panic handler, it's trylocks only. Could we somehow
> just check that using lockdep - is there an NMI lockdep context
> somewhere we could fake-grab? That's another upside of using an IPI
> btw: Real panics kill lockdep ;-)

Einj is supported by ACPI in combination with the hardwre. The injected
errors result in true MCEs, truly non-maskable. Lockdep might not be useful
in this case. Corrected Errors (CEs) don't result in panic but I guess it
might be possible to let it invoke your future mode-setting code for testing
purpose, without rebooting. (Notice that MCEs can happen right from inside
your mode-setting code while accessing any memory address)

But anyway we're not looking for a 100% working solution so if it could only
work in normal irq or ipi context, it'd already be a big plus compared to
what we have now.

Thanks
Rui

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/