[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20260209191234.GA1387802@noisy.programming.kicks-ass.net>
Date: Mon, 9 Feb 2026 20:12:34 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Josh Poimboeuf <jpoimboe@...nel.org>
Cc: Li Zhe <lizhe.67@...edance.com>, jikos@...nel.org, mbenes@...e.cz,
pmladek@...e.com, joe.lawrence@...hat.com,
live-patching@...r.kernel.org, linux-kernel@...r.kernel.org,
qirui.001@...edance.com, vschneid@...hat.com,
dave.hansen@...ux.intel.com
Subject: Re: [PATCH] klp: use stop machine to check and expedite transition
for running tasks
On Tue, Feb 03, 2026 at 06:20:22PM -0800, Josh Poimboeuf wrote:
> On Mon, Feb 02, 2026 at 05:13:34PM +0800, Li Zhe wrote:
> > In the current KLP transition implementation, the strategy for running
> > tasks relies on waiting for a context switch to attempt to clear the
> > TIF_PATCH_PENDING flag. Alternatively, determine whether the
> > TIF_PATCH_PENDING flag can be cleared by inspecting the stack once the
> > process has yielded the CPU. However, this approach proves problematic
> > in certain environments.
> >
> > Consider a scenario where the majority of system CPUs are configured
> > with nohzfull and isolcpus, each dedicated to a VM with a vCPU pinned
> > to that physical core and configured with idle=poll within the guest.
> > Under such conditions, these vCPUs rarely leave the CPU. Combined with
> > the high core counts typical of modern server platforms, this results
> > in transition completion times that are not only excessively prolonged
> > but also highly unpredictable.
> >
> > This patch resolves this issue by registering a callback with
> > stop_machine. The callback attempts to transition the associated running
> > task. In a VM environment configured with 32 CPUs, the live patching
> > operation completes promptly after the SIGNALS_TIMEOUT period with this
> > patch applied; without it, the process nearly fails to complete under
> > the same scenario.
> >
> > Co-developed-by: Rui Qi <qirui.001@...edance.com>
> > Signed-off-by: Rui Qi <qirui.001@...edance.com>
> > Signed-off-by: Li Zhe <lizhe.67@...edance.com>
>
> PeterZ, what's your take on this?
>
> I wonder if we could instead do resched_cpu() or something similar to
> trigger the call to klp_sched_try_switch() in __schedule()?
Yeah, this is broken. So the whole point of NOHZ_FULL is to not have the
CPU disturbed, *ever*.
People are working really hard to remove any and all disturbance from
these CPUs with the eventual goal of making any disturbance a fatal
condition (userspace will get a fatal signal if disturbed or so).
Explicitly adding disturbance to NOHZ_FULL is an absolute no-no.
NAK
There are two ways this can be solved:
1) make it a user problem -- userspace wants to load kernel patch,
userspace can force their QEMU or whatnot through a system call to make
progress
2) fix it properly and do it like the deferred IPI stuff; recognise
that as long as the task is in userspace, it doesn't care about kernel
text changes.
https://lkml.kernel.org/r/20251114150133.1056710-1-vschneid@redhat.com
While 2 sounds easy, the tricky comes from the fact that you have to
deal with the task coming back to kernel space eventually, possibly in
the middle of your KLP patching. So you've got to do thing like that
patch series above, and make sure the whole of KLP happens while the
other CPU is in USER/GUEST context or waits for things when it tries to
leave while things are in progress.
Powered by blists - more mailing lists