linux-kernel - Re: [PATCH 1/2] panic: Allow archs to reduce CPU consumption after panic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <bba47a8a-d2e6-433f-8128-b2a7fc05414d@gmail.com>
Date: Tue, 22 Apr 2025 07:44:21 -0500
From: Carlos Bilbao <carlos.bilbao.osdev@...il.com>
To: Sean Christopherson <seanjc@...gle.com>, Petr Mladek <pmladek@...e.com>,
 jan.glauber@...il.com
Cc: carlos.bilbao@...nel.org, tglx@...utronix.de, bilbao@...edu,
 akpm@...ux-foundation.org, jani.nikula@...el.com,
 linux-kernel@...r.kernel.org, gregkh@...uxfoundation.org,
 takakura@...inux.co.jp, john.ogness@...utronix.de,
 Linus Torvalds <torvalds@...ux-foundation.org>,
 Jiri Slaby <jirislaby@...nel.org>, Carlos Bilbao <bilbao@...edu>
Subject: Re: [PATCH 1/2] panic: Allow archs to reduce CPU consumption after
 panic

Thanks for the feedback, everyone!


On 4/11/25 11:31, Sean Christopherson wrote:

> On Fri, Apr 11, 2025, Petr Mladek wrote:
>> Added Linus and Jiri (tty) into Cc.
>>
>> On Wed 2025-03-26 10:12:03, carlos.bilbao@...nel.org wrote:
>>> From: Carlos Bilbao <carlos.bilbao@...nel.org>
>>>
>>> After handling a panic, the kernel enters a busy-wait loop, unnecessarily
>>> consuming CPU and potentially impacting other workloads including other
>>> guest VMs in the case of virtualized setups.
> Impacting other guests isn't the guest kernel's problem.  If the host has heavily
> overcommited CPUs and can't meet SLOs because VMs are panicking and not rebooting,
> that's a host problem.
>
> This could become a customer problem if they're getting billed based on CPU usage,
> but I don't know that simply doing HLT is the best solution.  E.g. advising the
> customer to configure their kernels to kexec into a kdump kernel or to reboot
> on panic, seems like it would provide a better overall experience for most.
>
> QEMU (assuming y'all use QEMU) also supports a pvpanic device, so unless the VM
> and/or customer is using a funky setup, the host should already know the guest
> has panicked.  At that point, the host can make appropiate scheduling decisions,
> e.g. userspace can simply stop running the VM after a certain timeout, throttle
> it, jail it, etc.
>
>>> Introduce cpu_halt_after_panic(), a weak function that archs can override
>>> for a more efficient halt of CPU work. By default, it preserves the
>>> pre-existing behavior of delay.
>>>
>>> Signed-off-by: Carlos Bilbao (DigitalOcean) <carlos.bilbao@...nel.org>
>>> Reviewed-by: Jan Glauber (DigitalOcean) <jan.glauber@...il.com>
>>> ---
>>>  kernel/panic.c | 12 +++++++++++-
>>>  1 file changed, 11 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/panic.c b/kernel/panic.c
>>> index fbc59b3b64d0..fafe3fa22533 100644
>>> --- a/kernel/panic.c
>>> +++ b/kernel/panic.c
>>> @@ -276,6 +276,16 @@ static void panic_other_cpus_shutdown(bool crash_kexec)
>>>  		crash_smp_send_stop();
>>>  }
>>>  
>>> +/*
>>> + * Called after a kernel panic has been handled, at which stage halting
>>> + * the CPU can help reduce unnecessary CPU consumption. In the absence of
>>> + * arch-specific implementations, just delay
>>> + */
>>> +static void __weak cpu_halt_after_panic(void)
>>> +{
>>> +	mdelay(PANIC_TIMER_STEP);
>>> +}
>>> +
>>>  /**
>>>   *	panic - halt the system
>>>   *	@fmt: The text string to print
>>> @@ -474,7 +484,7 @@ void panic(const char *fmt, ...)
>>>  			i += panic_blink(state ^= 1);
>>>  			i_next = i + 3600 / PANIC_BLINK_SPD;
>>>  		}
>>> -		mdelay(PANIC_TIMER_STEP);
>>> +		cpu_halt_after_panic();
>> The 2nd patch implements this functions using safe_halt().
>>
>> IMHO, it makes perfect sense to halt a virtualized system or
> I really, really don't want to arbitrarily single out VMs, especially in core
> kernel code, as doing so risks creating problems that are unique to VMs.  If the
> behavior is based on a virtualization-agnostic heuristic like "no console", then
> by all means, but please don't condition something like this purely based on
> running in a VM.
>
> I also have no objection to the host/hypervisor prescribing specific behavior
> (see below).  What I don't like is the kernel deciding to do things differently
> because it's a VM, without any knowledge of the environment in which the VM is
> running.


Sean, I understand your point that we shouldn’t make core kernel behavior arbitrarily VM-specific. Even though, arguably, my RFC cover letter focused on VMs -- as that’s where I detected the issue (without oversubscription, BTW) -- the intention is to provide a more general post-panic solution that can help both VM/bare-metal envs. As Jan mentioned, the current default (a
CPU spinning forever after panic()) is often suboptimal.


>> a system without a registered "graphical" console.
>>
>> I think that the blinking diods were designed for desktops
>> and laptops with some "graphical" terminal attached. The
>> infinite loop allows to read the last lines, ideally
>> the backtrace on the screen.
>>
>> I wonder if we could make it more clever and do the halt
>> only when the system is virtualized or when there is no
>> "graphical" console registered.
> Could we do something with panic_notifier_list?  E.g. let panic devices override
> the default post-print behavior.  Then VMs with a paravirt panic interface could
> make an explicit call out to the host instead of relying on arch specific code
> to HLT and hope for the best.  And maybe console drivers that want/need to keep
> the CPU running can nullify panic_halt?


I like your suggestion of a panic_set_hlt() API with a priority mechanism
-- it’s more flexible than a weak function, and I’m happy to integrate that
into v2. Here's what I propose:

    Patch 1: Introduce panic_set_hlt(func, prio).
    Patch 2: Register a fallback safe_halt() implementation at priority 0
             that just calls safe_halt().

Please, LMK if you've any concerns with that plan.


> E.g. with a rudimentary priority system so that paravirt devices can override
> everything else.
>
> diff --git a/kernel/panic.c b/kernel/panic.c
> index d8635d5cecb2..fe9007222308 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -141,6 +141,18 @@ static long no_blink(int state)
>  long (*panic_blink)(int state);
>  EXPORT_SYMBOL(panic_blink);
>  
> +static void (*panic_halt)(void) = cpu_halt_after_panic;
> +static int panic_hlt_priority;
> +
> +void panic_set_hlt(void (*fn)(void), int priority)
> +{
> +       if (priority <= panic_hlt_priority)
> +               return;
> +
> +       panic_halt = fn;
> +}
> +EXPORT_SYMBOL_GPL(panic_set_hlt);
> +
>  /*
>   * Stop ourself in panic -- architecture code may override this
>   */
> @@ -467,6 +479,9 @@ void panic(const char *fmt, ...)
>         console_flush_on_panic(CONSOLE_FLUSH_PENDING);
>         nbcon_atomic_flush_unsafe();
>  
> +       if (panic_halt)
> +               panic_halt();
> +
>         local_irq_enable();
>         for (i = 0; ; i += PANIC_TIMER_STEP) {
>                 touch_softlockup_watchdog();
>

Thanks,

Carlos