[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
<SN6PR02MB4157AF2E765F7ED3B9487351D4222@SN6PR02MB4157.namprd02.prod.outlook.com>
Date: Tue, 5 Mar 2024 16:23:07 +0000
From: Michael Kelley <mhklinux@...look.com>
To: "Guilherme G. Piccoli" <gpiccoli@...lia.com>, John Ogness
<john.ogness@...utronix.de>, Jocelyn Falempe <jfalempe@...hat.com>, Linux
Kernel Mailing List <linux-kernel@...r.kernel.org>, Daniel Vetter
<daniel@...ll.ch>, Andrew Morton <akpm@...ux-foundation.org>, "Peter Zijlstra
(Intel)" <peterz@...radead.org>, Josh Poimboeuf <jpoimboe@...nel.org>, Arnd
Bergmann <arnd@...db.de>, Kefeng Wang <wangkefeng.wang@...wei.com>, Lukas
Wunner <lukas@...ner.de>, Uros Bizjak <ubizjak@...il.com>, Petr Mladek
<pmladek@...e.com>, Daniel Thompson <daniel.thompson@...aro.org>, Douglas
Anderson <dianders@...omium.org>
CC: "dri-devel@...ts.freedesktop.org" <dri-devel@...ts.freedesktop.org>, David
Airlie <airlied@...hat.com>, Thomas Zimmermann <tzimmermann@...e.de>
Subject: RE: [RFC] How to test panic handlers, without crashing the kernel
From: Guilherme G. Piccoli <gpiccoli@...lia.com> Sent: Monday, March 4, 2024 1:43 PM
>
> On 04/03/2024 18:12, John Ogness wrote:
> > [...]
> >> The second question is how to simulate a panic context in a
> >> non-destructive way, so we can test the panic notifiers in CI, without
> >> crashing the machine.
> >
> > I'm wondering if a "fake panic" can be implemented that quiesces all the
> > other CPUs via NMI (similar to kdb) and then calls the panic
> > notifiers. And finally releases everything back to normal. That might
> > produce a fairly realistic panic situation and should be fairly
> > non-destructive (depending on what the notifiers do and how long they
> > take).
> >
>
> Hi Jocelyn / John,
>
> one concern here is that the panic notifiers are kind of a no man's
> land, so we can have very simple / safe ones, while others are
> destructive in nature.
>
> An example of a good behaving notifier that is destructive is the
> Hyper-V one, that destroys an essential host-guest interface (called
> "vmbus connection"). What happens if we trigger this one just for
> testing purposes in a debugfs interface? Likely the guest would die...
>
> [+CCing Michael Kelley here since he seems interested in panic and is
> also expert in Hyper-V, just in case my example is bogus.]
The Hyper-V example is valid. After hv_panic_vmbus_unload()
is called, the VM won't be able to do any disk, network, or graphics
frame buffer I/O. There's no recovery short of restarting the VM.
Michael
[I have retired from Microsoft. I'm still occasionally contributing
to Linux kernel work with email mhklinux@...look.com.]
>
> So, maybe the problem could be split in 2: the non-notifiers portion of
> the panic path, and the the notifiers; maybe restricting the notifiers
> you'd run is a way to circumvent the risks, like if you could pass a
> list of the specific notifiers you aim to test, this could be
> interesting. Let's see what the others think and thanks for your work in
> the DRM panic notifier =)
>
> Cheers,
>
>
> Guilherme
Powered by blists - more mailing lists