[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aBJIxJ-2Lfke1MGq@google.com>
Date: Wed, 30 Apr 2025 08:59:00 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Carlos Bilbao <bilbao@...edu>, Andrew Morton <akpm@...ux-foundation.org>, carlos.bilbao@...nel.org,
tglx@...utronix.de, jan.glauber@...il.com, pmladek@...e.com,
jani.nikula@...el.com, linux-kernel@...r.kernel.org,
gregkh@...uxfoundation.org, takakura@...inux.co.jp, john.ogness@...utronix.de,
x86@...nel.org
Subject: Re: [PATCH v3 0/2] Reduce CPU consumption after panic
On Wed, Apr 30, 2025, Peter Zijlstra wrote:
> All that said... the default more or less does for(;;) { mdelay(100) },
> if you have a modern chip that should not end up using much power at
> all. That should end up in delay_halt_tpause() or delay_halt_mwaitx()
> (depending on you being on Intel or AMD). And spend most its time in
> deep idle states.
>
> Is something not working?
The motivation is to coerce vCPUs into yielding the physical CPU so that a
different vCPU can be scheduled in when the host is oversubscribed. IMO, that's
firmly a "host" problem to solve, where the solution might involve educating
customers for their own benefit[*].
I am indifferent as to whether or not the kernels halts during panic(), my
suggestions/feedback in earlier versions were purely to not make any behavior
specific to VMs. I.e. I am strongly opposed to implementing behavior that kicks
in only when running as a guest.
[*] from https://lore.kernel.org/all/Z_lDzyXJ8JKqOyzs@google.com:
: On Fri, Apr 11, 2025 at 9:31 AM Sean Christopherson <seanjc@...gle.com> wrote:
: > > On Wed 2025-03-26 10:12:03, carlos.bilbao@...nel.org wrote:
: > > > After handling a panic, the kernel enters a busy-wait loop, unnecessarily
: > > > consuming CPU and potentially impacting other workloads including other
: > > > guest VMs in the case of virtualized setups.
: >
: > Impacting other guests isn't the guest kernel's problem. If the host has heavily
: > overcommited CPUs and can't meet SLOs because VMs are panicking and not rebooting,
: > that's a host problem.
: >
: > This could become a customer problem if they're getting billed based on CPU usage,
: > but I don't know that simply doing HLT is the best solution. E.g. advising the
: > customer to configure their kernels to kexec into a kdump kernel or to reboot
: > on panic, seems like it would provide a better overall experience for most.
: >
: > QEMU (assuming y'all use QEMU) also supports a pvpanic device, so unless the VM
: > and/or customer is using a funky setup, the host should already know the guest
: > has panicked. At that point, the host can make appropiate scheduling decisions,
: > e.g. userspace can simply stop running the VM after a certain timeout, throttle
: > it, jail it, etc.
Powered by blists - more mailing lists