lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87edcpn1l3.fsf@jogness.linutronix.de>
Date: Mon, 04 Mar 2024 22:18:24 +0106
From: John Ogness <john.ogness@...utronix.de>
To: Jocelyn Falempe <jfalempe@...hat.com>, Linux Kernel Mailing List
 <linux-kernel@...r.kernel.org>, Daniel Vetter <daniel@...ll.ch>, Andrew
 Morton <akpm@...ux-foundation.org>, "Peter Zijlstra (Intel)"
 <peterz@...radead.org>, Josh Poimboeuf <jpoimboe@...nel.org>, Arnd
 Bergmann <arnd@...db.de>, Kefeng Wang <wangkefeng.wang@...wei.com>, Lukas
 Wunner <lukas@...ner.de>, Uros Bizjak <ubizjak@...il.com>, "Guilherme G.
 Piccoli" <gpiccoli@...lia.com>, Uros Bizjak <ubizjak@...il.com>, Petr
 Mladek <pmladek@...e.com>, Daniel Thompson <daniel.thompson@...aro.org>,
 Douglas Anderson <dianders@...omium.org>
Cc: "dri-devel@...ts.freedesktop.org" <dri-devel@...ts.freedesktop.org>,
 David Airlie <airlied@...hat.com>, Thomas Zimmermann <tzimmermann@...e.de>
Subject: Re: [RFC] How to test panic handlers, without crashing the kernel

[Added printk maintainer and kdb folks]

Hi Jocelyn,

On 2024-03-01, Jocelyn Falempe <jfalempe@...hat.com> wrote:
> While writing a panic handler for drm devices [1], I needed a way to 
> test it without crashing the machine.
> So from debugfs, I called 
> atomic_notifier_call_chain(&panic_notifier_list, ...), but it has the 
> side effect of calling all other panic notifiers registered.
>
> So Sima suggested to move that to the generic panic code, and test all 
> panic notifiers with a dedicated debugfs interface.
>
> I can move that code to kernel/, but before doing that, I would like to 
> know if you think that's the right way to test the panic code.

One major event that happens before the panic notifiers is
panic_other_cpus_shutdown(). This can cause special situations because
CPUs can be stopped while holding resources (such as raw spin
locks). And these are the situations that make it so tricky to have safe
and reliable notifiers. If triggered from debugfs, these situations will
never occur.

My concern is that the tests via debugfs will always succeed, but in the
real world panic notifiers are failing/hanging/exploding. IMHO useful
panic testing requires real panic'ing.

For my printk panic tests I trigger unknown NMIs while booting with
"unknown_nmi_panic". Particularly with Qemu this is quite easy and
amazingly effective at catching problems. In fact, a recent printk
series [0] fixed seven issues that were found through this method of
panic testing.

> The second question is how to simulate a panic context in a
> non-destructive way, so we can test the panic notifiers in CI, without
> crashing the machine.

I'm wondering if a "fake panic" can be implemented that quiesces all the
other CPUs via NMI (similar to kdb) and then calls the panic
notifiers. And finally releases everything back to normal. That might
produce a fairly realistic panic situation and should be fairly
non-destructive (depending on what the notifiers do and how long they
take).

> The worst case for a panic notifier, is when the panic occurs in NMI
> context, but I don't know how to simulate that. The goal would be to
> find early if a panic notifier tries to sleep, or do other things that
> are not allowed in a panic context.

Maybe with a new boot argument "unknown_nmi_fake_panic" that triggers
the fake panic instead?

John Ogness

[0] https://lore.kernel.org/lkml/20240207134103.1357162-1-john.ogness@linutronix.de

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ