linux-kernel - [RFC] panic: reduce CPU consumption when finished handling panic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7b3a0288-20f9-42cf-af81-e10ad2d04b27@gmail.com>
Date: Fri, 21 Mar 2025 08:01:52 -0500
From: Carlos Bilbao <carlos.bilbao.osdev@...il.com>
To: pmladek@...e.com, Andrew Morton <akpm@...ux-foundation.org>,
 jani.nikula@...el.com, open list <linux-kernel@...r.kernel.org>,
 Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
 Thomas Gleixner <tglx@...utronix.de>, takakura@...inux.co.jp,
 john.ogness@...utronix.de
Cc: jglauber@...italocean.com
Subject: [RFC] panic: reduce CPU consumption when finished handling panic

Hello again,


I thought it would be helpful to share some numbers to support my claim
and a couple ideas to improve the patch. Below are the perf stats from
the hypervisor after triggering a panic on a guest running kernel v5.15
(I'll provide the details of the experiment afterward.)


Samples: 55K of event 'cycles:P', Event count (approx.): 36090772574
Overhead  Command          Shared Object            Symbol
  42.20%  CPU 5/KVM        [kernel.kallsyms]        [k] vmx_vmexit
  19.07%  CPU 5/KVM        [kernel.kallsyms]        [k] vmx_spec_ctrl_restore_host
   9.73%  CPU 5/KVM        [kernel.kallsyms]        [k] vmx_vcpu_enter_exit
   3.60%  CPU 5/KVM        [kernel.kallsyms]        [k] __flush_smp_call_function_queue
   2.91%  CPU 5/KVM        [kernel.kallsyms]        [k] vmx_vcpu_run
   2.85%  CPU 5/KVM        [kernel.kallsyms]        [k] native_irq_return_iret
   2.67%  CPU 5/KVM        [kernel.kallsyms]        [k] native_flush_tlb_one_user
   2.16%  CPU 5/KVM        [kernel.kallsyms]        [k] llist_reverse_order
   2.10%  CPU 5/KVM        [kernel.kallsyms]        [k] __srcu_read_lock
   2.08%  CPU 5/KVM        [kernel.kallsyms]        [k] flush_tlb_func
   1.52%  CPU 5/KVM        [kernel.kallsyms]        [k] vcpu_enter_guest.constprop.0
   1.50%  CPU 5/KVM        [kernel.kallsyms]        [k] native_apic_msr_eoi
   1.01%  CPU 5/KVM        [kernel.kallsyms]        [k] clear_bhb_loop
   0.66%  CPU 5/KVM        [kernel.kallsyms]        [k] sysvec_call_function_single


And here are the results from the guest VM after applying my patch:


Samples: 28  of event 'cycles:P', Event count (approx.): 28961952
Overhead  Command          Shared Object            Symbol
  11.03%  qemu-system-x86  [kernel.kallsyms]        [k] task_mm_cid_work
  11.03%  qemu-system-x86  qemu-system-x86_64       [.] 0x0000000000579944
   9.80%  qemu-system-x86  qemu-system-x86_64       [.] 0x000000000056512b
   8.45%  IO mon_iothread  libc.so.6                [.] 0x00000000000a3f12
   8.45%  IO mon_iothread  libglib-2.0.so.0.7200.4  [.] g_mutex_lock
   7.51%  IO mon_iothread  [kernel.kallsyms]        [k] avg_vruntime
   6.65%  IO mon_iothread  libc.so.6                [.] write
   5.93%  IO mon_iothread  [kernel.kallsyms]        [k] security_file_permission
   4.97%  qemu-system-x86  libglib-2.0.so.0.7200.4  [.] g_thread_self
   4.64%  IO mon_iothread  [kernel.kallsyms]        [k] aa_label_sk_perm.part.0
   4.13%  IO mon_iothread  libglib-2.0.so.0.7200.4  [.] g_main_context_release
   3.79%  IO mon_iothread  [kernel.kallsyms]        [k] seccomp_run_filters
   3.42%  IO mon_iothread  libglib-2.0.so.0.7200.4  [.] g_main_context_dispatch
   3.42%  IO mon_iothread  qemu-system-x86_64       [.] 0x00000000004edbab
   3.28%  IO mon_iothread  qemu-system-x86_64       [.] 0x00000000005999c8
   3.09%  IO mon_iothread  qemu-system-x86_64       [.] 0x00000000004e636b
   0.22%  qemu-system-x86  [kernel.kallsyms]        [k] __intel_pmu_enable_all.constprop.0


As you can see, CPU consumption is significantly reduced after applying the
proposed change during panic, with KVM-related functions (e.g.,
vmx_vmexit) dropping from more than 70% of CPU usage to virtually nothing.
Also, the num of samples decreased from 55K to 28, and the event count
dropped from 36.09 billion to 28.96 million.


Jan suggested that a better way to implement cpu_halt_end_panic() (perhaps
cpu_halt_after_panic() is a better name) would be to define it as a weak
function in asm-generic, allowing archs to overwrite it. What do you think?

Thank you in advance!

Regards,
Carlos

---

Details on the experiment:

- Linux kernel v5.15 (commit 8bb7eca)

-  VM guest CPU: Intel(R) Xeon(R) Gold 5318Y CPU @ 2.10GHz

- I executed to collect samples:
  /usr/bin/perf record -p 2618527 -a sleep 30

- Image Ubuntu 22.04 (LTS) x64, 8 vCPUs, 16GB / 100GB Disk


Thanks,

Carlos


On 3/17/25 17:01, Carlos Bilbao wrote:
> After the kernel has finished handling a panic, it enters a busy-wait loop.
> But, this unnecessarily consumes CPU power and electricity. Plus, in VMs,
> this negatively impacts the throughput of other VM guests running on the
> same hypervisor.
>
> I propose introducing a function cpu_halt_end_panic() to halt the CPU
> during this state while still allowing interrupts to be processed. See my
> commit below.
>
> Thanks in advance!
>
> Signed-off-by: Carlos Bilbao <carlos.bilbao@...nel.org>
> ---
>  kernel/panic.c | 17 ++++++++++++++++-
>  1 file changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/panic.c b/kernel/panic.c
> index fbc59b3b64d0..c00ccaa698d5 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -276,6 +276,21 @@ static void panic_other_cpus_shutdown(bool crash_kexec)
>          crash_smp_send_stop();
>  }
>  
> +static void cpu_halt_end_panic(void)
> +{
> +#ifdef CONFIG_X86
> +    native_safe_halt();
> +#elif defined(CONFIG_ARM)
> +    cpu_do_idle();
> +#else
> +    /*
> +     * Default to a simple busy-wait if no architecture-specific halt is
> +     * defined above
> +     */
> +    mdelay(PANIC_TIMER_STEP);
> +#endif
> +}
> +
>  /**
>   *    panic - halt the system
>   *    @fmt: The text string to print
> @@ -474,7 +489,7 @@ void panic(const char *fmt, ...)
>              i += panic_blink(state ^= 1);
>              i_next = i + 3600 / PANIC_BLINK_SPD;
>          }
> -        mdelay(PANIC_TIMER_STEP);
> +        cpu_halt_end_panic();
>      }
>  }
>