linux-kernel - Re: [RFC] How to test panic handlers, without crashing the kernel

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <d1d2093c-72a3-4f64-9a8f-9844dc38f0c5@redhat.com>
Date: Tue, 5 Mar 2024 17:52:40 +0100
From: Jocelyn Falempe <jfalempe@...hat.com>
To: Michael Kelley <mhklinux@...look.com>,
 "Guilherme G. Piccoli" <gpiccoli@...lia.com>,
 John Ogness <john.ogness@...utronix.de>,
 Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
 Daniel Vetter <daniel@...ll.ch>, Andrew Morton <akpm@...ux-foundation.org>,
 "Peter Zijlstra (Intel)" <peterz@...radead.org>,
 Josh Poimboeuf <jpoimboe@...nel.org>, Arnd Bergmann <arnd@...db.de>,
 Kefeng Wang <wangkefeng.wang@...wei.com>, Lukas Wunner <lukas@...ner.de>,
 Uros Bizjak <ubizjak@...il.com>, Petr Mladek <pmladek@...e.com>,
 Daniel Thompson <daniel.thompson@...aro.org>,
 Douglas Anderson <dianders@...omium.org>
Cc: "dri-devel@...ts.freedesktop.org" <dri-devel@...ts.freedesktop.org>,
 David Airlie <airlied@...hat.com>, Thomas Zimmermann <tzimmermann@...e.de>
Subject: Re: [RFC] How to test panic handlers, without crashing the kernel

On 05/03/2024 17:23, Michael Kelley wrote:
> From: Guilherme G. Piccoli <gpiccoli@...lia.com> Sent: Monday, March 4, 2024 1:43 PM
>>
>> On 04/03/2024 18:12, John Ogness wrote:
>>> [...]
>>>> The second question is how to simulate a panic context in a
>>>> non-destructive way, so we can test the panic notifiers in CI, without
>>>> crashing the machine.
>>>
>>> I'm wondering if a "fake panic" can be implemented that quiesces all the
>>> other CPUs via NMI (similar to kdb) and then calls the panic
>>> notifiers. And finally releases everything back to normal. That might
>>> produce a fairly realistic panic situation and should be fairly
>>> non-destructive (depending on what the notifiers do and how long they
>>> take).
>>>
>>
>> Hi Jocelyn / John,
>>
>> one concern here is that the panic notifiers are kind of a no man's
>> land, so we can have very simple / safe ones, while others are
>> destructive in nature.
>>
>> An example of a good behaving notifier that is destructive is the
>> Hyper-V one, that destroys an essential host-guest interface (called
>> "vmbus connection"). What happens if we trigger this one just for
>> testing purposes in a debugfs interface? Likely the guest would die...
>>
>> [+CCing Michael Kelley here since he seems interested in panic and is
>> also expert in Hyper-V, just in case my example is bogus.]
> 
> The Hyper-V example is valid. After hv_panic_vmbus_unload()
> is called, the VM won't be able to do any disk, network, or graphics
> frame buffer I/O. There's no recovery short of restarting the VM.

Thanks for the confirmation.
> 
> Michael
> 
> [I have retired from Microsoft.  I'm still occasionally contributing
> to Linux kernel work with email mhklinux@...look.com.]
> 
>>
>> So, maybe the problem could be split in 2: the non-notifiers portion of
>> the panic path, and the the notifiers; maybe restricting the notifiers
>> you'd run is a way to circumvent the risks, like if you could pass a
>> list of the specific notifiers you aim to test, this could be
>> interesting. Let's see what the others think and thanks for your work in
>> the DRM panic notifier =)

Or maybe have two lists of panic notifiers, the safe and the destructive 
list. So in case of fake panic, we can only call the safe notifiers.

>>
>> Cheers,
>>
>>
>> Guilherme
>