[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <bba47a8a-d2e6-433f-8128-b2a7fc05414d@gmail.com>
Date: Tue, 22 Apr 2025 07:44:21 -0500
From: Carlos Bilbao <carlos.bilbao.osdev@...il.com>
To: Sean Christopherson <seanjc@...gle.com>, Petr Mladek <pmladek@...e.com>,
jan.glauber@...il.com
Cc: carlos.bilbao@...nel.org, tglx@...utronix.de, bilbao@...edu,
akpm@...ux-foundation.org, jani.nikula@...el.com,
linux-kernel@...r.kernel.org, gregkh@...uxfoundation.org,
takakura@...inux.co.jp, john.ogness@...utronix.de,
Linus Torvalds <torvalds@...ux-foundation.org>,
Jiri Slaby <jirislaby@...nel.org>, Carlos Bilbao <bilbao@...edu>
Subject: Re: [PATCH 1/2] panic: Allow archs to reduce CPU consumption after
panic
Thanks for the feedback, everyone!
On 4/11/25 11:31, Sean Christopherson wrote:
> On Fri, Apr 11, 2025, Petr Mladek wrote:
>> Added Linus and Jiri (tty) into Cc.
>>
>> On Wed 2025-03-26 10:12:03, carlos.bilbao@...nel.org wrote:
>>> From: Carlos Bilbao <carlos.bilbao@...nel.org>
>>>
>>> After handling a panic, the kernel enters a busy-wait loop, unnecessarily
>>> consuming CPU and potentially impacting other workloads including other
>>> guest VMs in the case of virtualized setups.
> Impacting other guests isn't the guest kernel's problem. If the host has heavily
> overcommited CPUs and can't meet SLOs because VMs are panicking and not rebooting,
> that's a host problem.
>
> This could become a customer problem if they're getting billed based on CPU usage,
> but I don't know that simply doing HLT is the best solution. E.g. advising the
> customer to configure their kernels to kexec into a kdump kernel or to reboot
> on panic, seems like it would provide a better overall experience for most.
>
> QEMU (assuming y'all use QEMU) also supports a pvpanic device, so unless the VM
> and/or customer is using a funky setup, the host should already know the guest
> has panicked. At that point, the host can make appropiate scheduling decisions,
> e.g. userspace can simply stop running the VM after a certain timeout, throttle
> it, jail it, etc.
>
>>> Introduce cpu_halt_after_panic(), a weak function that archs can override
>>> for a more efficient halt of CPU work. By default, it preserves the
>>> pre-existing behavior of delay.
>>>
>>> Signed-off-by: Carlos Bilbao (DigitalOcean) <carlos.bilbao@...nel.org>
>>> Reviewed-by: Jan Glauber (DigitalOcean) <jan.glauber@...il.com>
>>> ---
>>> kernel/panic.c | 12 +++++++++++-
>>> 1 file changed, 11 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/panic.c b/kernel/panic.c
>>> index fbc59b3b64d0..fafe3fa22533 100644
>>> --- a/kernel/panic.c
>>> +++ b/kernel/panic.c
>>> @@ -276,6 +276,16 @@ static void panic_other_cpus_shutdown(bool crash_kexec)
>>> crash_smp_send_stop();
>>> }
>>>
>>> +/*
>>> + * Called after a kernel panic has been handled, at which stage halting
>>> + * the CPU can help reduce unnecessary CPU consumption. In the absence of
>>> + * arch-specific implementations, just delay
>>> + */
>>> +static void __weak cpu_halt_after_panic(void)
>>> +{
>>> + mdelay(PANIC_TIMER_STEP);
>>> +}
>>> +
>>> /**
>>> * panic - halt the system
>>> * @fmt: The text string to print
>>> @@ -474,7 +484,7 @@ void panic(const char *fmt, ...)
>>> i += panic_blink(state ^= 1);
>>> i_next = i + 3600 / PANIC_BLINK_SPD;
>>> }
>>> - mdelay(PANIC_TIMER_STEP);
>>> + cpu_halt_after_panic();
>> The 2nd patch implements this functions using safe_halt().
>>
>> IMHO, it makes perfect sense to halt a virtualized system or
> I really, really don't want to arbitrarily single out VMs, especially in core
> kernel code, as doing so risks creating problems that are unique to VMs. If the
> behavior is based on a virtualization-agnostic heuristic like "no console", then
> by all means, but please don't condition something like this purely based on
> running in a VM.
>
> I also have no objection to the host/hypervisor prescribing specific behavior
> (see below). What I don't like is the kernel deciding to do things differently
> because it's a VM, without any knowledge of the environment in which the VM is
> running.
Sean, I understand your point that we shouldn’t make core kernel behavior arbitrarily VM-specific. Even though, arguably, my RFC cover letter focused on VMs -- as that’s where I detected the issue (without oversubscription, BTW) -- the intention is to provide a more general post-panic solution that can help both VM/bare-metal envs. As Jan mentioned, the current default (a
CPU spinning forever after panic()) is often suboptimal.
>> a system without a registered "graphical" console.
>>
>> I think that the blinking diods were designed for desktops
>> and laptops with some "graphical" terminal attached. The
>> infinite loop allows to read the last lines, ideally
>> the backtrace on the screen.
>>
>> I wonder if we could make it more clever and do the halt
>> only when the system is virtualized or when there is no
>> "graphical" console registered.
> Could we do something with panic_notifier_list? E.g. let panic devices override
> the default post-print behavior. Then VMs with a paravirt panic interface could
> make an explicit call out to the host instead of relying on arch specific code
> to HLT and hope for the best. And maybe console drivers that want/need to keep
> the CPU running can nullify panic_halt?
I like your suggestion of a panic_set_hlt() API with a priority mechanism
-- it’s more flexible than a weak function, and I’m happy to integrate that
into v2. Here's what I propose:
Patch 1: Introduce panic_set_hlt(func, prio).
Patch 2: Register a fallback safe_halt() implementation at priority 0
that just calls safe_halt().
Please, LMK if you've any concerns with that plan.
> E.g. with a rudimentary priority system so that paravirt devices can override
> everything else.
>
> diff --git a/kernel/panic.c b/kernel/panic.c
> index d8635d5cecb2..fe9007222308 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -141,6 +141,18 @@ static long no_blink(int state)
> long (*panic_blink)(int state);
> EXPORT_SYMBOL(panic_blink);
>
> +static void (*panic_halt)(void) = cpu_halt_after_panic;
> +static int panic_hlt_priority;
> +
> +void panic_set_hlt(void (*fn)(void), int priority)
> +{
> + if (priority <= panic_hlt_priority)
> + return;
> +
> + panic_halt = fn;
> +}
> +EXPORT_SYMBOL_GPL(panic_set_hlt);
> +
> /*
> * Stop ourself in panic -- architecture code may override this
> */
> @@ -467,6 +479,9 @@ void panic(const char *fmt, ...)
> console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> nbcon_atomic_flush_unsafe();
>
> + if (panic_halt)
> + panic_halt();
> +
> local_irq_enable();
> for (i = 0; ; i += PANIC_TIMER_STEP) {
> touch_softlockup_watchdog();
>
Thanks,
Carlos
Powered by blists - more mailing lists