lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAKMK7uE4thXWwuuHeZcrhXQJQNtd9NnjsvAouN1gfH0UNxTTxg@mail.gmail.com>
Date:	Wed, 1 Jul 2015 11:59:34 +0200
From:	Daniel Vetter <daniel.vetter@...ll.ch>
To:	Rui Wang <rui.y.wang@...el.com>
Cc:	Borislav Petkov <bp@...en8.de>, "Luck, Tony" <tony.luck@...el.com>,
	Dave Airlie <airlied@...hat.com>,
	"Clark, Rob" <robdclark@...il.com>,
	Matthew D Roper <matthew.d.roper@...el.com>,
	"Chen, Gong" <gong.chen@...el.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: drm/mgag200: doesn't work in panic context

On Wed, Jul 1, 2015 at 9:26 AM, Rui Wang <rui.y.wang@...el.com> wrote:
> On Tuesday, June 30, 2015 11:24 PM, Daniel Vetter <daniel.vetter@...ll.ch> wrote:
>> On Tue, Jun 30, 2015 at 9:23 AM, Rui Wang <rui.y.wang@...el.com> wrote:
>> > But einj does something more than what an IPI can do, it injects hardware
>> > errors which trigger exceptions in NMI context... and the exception handler
>> > usually panics on fatal errors. And the display may be the only way to catch
>> > what has happened. I'm just hoping that the future version may work in
>> > NMI context.
>>
>> NMI sounds ... ambigous ;-) But yeah if we can somehow inject
>> something as an NMI too then that would be even better. What I want to
>> avoid is forcing reboots, since that means you can't run a basic
>> modeset test afterwards to make sure nothing was trampled too badly.
>> Of course we'd have replace the screen contents, but the important
>> part is that the panic handler doen't touch anything if the driver is
>> in modeset code right now (because it'll massively increase the risk
>> of dying completely), and an easy way to check that it didn't step all
>> over modeset state unduly is to do a modeset afterwards. If that works
>> we'll be fine.
>>
>> Also with that approach we can make sure that no real errors get into
>> dmesg (as opposed to a real panic), which means we can capture dmesg
>> afterwards and if there is a seroius log message (or even backtrace)
>> then drm panic handling has a bug.
>>
>> All that isn't possible when we force a real panic to happen.
>>
>> Actually thinking more about NMI that shouldn't be a problem. The
>> important thing with nmi vs. hardirq is that you can't even reliably
>> grab an irqsave spinlock, it's trylocks all the way down. But that
>> also holds for the panic handler, it's trylocks only. Could we somehow
>> just check that using lockdep - is there an NMI lockdep context
>> somewhere we could fake-grab? That's another upside of using an IPI
>> btw: Real panics kill lockdep ;-)
>
> Einj is supported by ACPI in combination with the hardwre. The injected
> errors result in true MCEs, truly non-maskable. Lockdep might not be useful
> in this case. Corrected Errors (CEs) don't result in panic but I guess it
> might be possible to let it invoke your future mode-setting code for testing
> purpose, without rebooting. (Notice that MCEs can happen right from inside
> your mode-setting code while accessing any memory address)

Yeah NMI can happen anywhere and that's about the worst-case panic
context we have. The problem is that NMI bugs are a giant pain to
debug, so for testing I think it'd be better to just have a hardirq
context + the help of lockdep (if possible) to make sure we only do
try_lock and lockless stuff.

> But anyway we're not looking for a 100% working solution so if it could only
> work in normal irq or ipi context, it'd already be a big plus compared to
> what we have now.

NMI vs ipi vs other stuff is just about what's the best debug/testing
strategy. Most of the work there will really be in writing tons of
testcases to race the drm panic handler against drm modeset ioctls.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ