[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20260204024756.6776-1-lizhe.67@bytedance.com>
Date: Wed, 4 Feb 2026 10:47:56 +0800
From: "Li Zhe" <lizhe.67@...edance.com>
To: <jpoimboe@...nel.org>
Cc: <jikos@...nel.org>, <joe.lawrence@...hat.com>,
<linux-kernel@...r.kernel.org>, <live-patching@...r.kernel.org>,
<lizhe.67@...edance.com>, <mbenes@...e.cz>, <peterz@...radead.org>,
<pmladek@...e.com>, <qirui.001@...edance.com>
Subject: Re: [PATCH] klp: use stop machine to check and expedite transition for running tasks
On Tue, 3 Feb 2026 18:20:22 -0800, jpoimboe@...nel.org wrote:
> On Mon, Feb 02, 2026 at 05:13:34PM +0800, Li Zhe wrote:
> > In the current KLP transition implementation, the strategy for running
> > tasks relies on waiting for a context switch to attempt to clear the
> > TIF_PATCH_PENDING flag. Alternatively, determine whether the
> > TIF_PATCH_PENDING flag can be cleared by inspecting the stack once the
> > process has yielded the CPU. However, this approach proves problematic
> > in certain environments.
> >
> > Consider a scenario where the majority of system CPUs are configured
> > with nohzfull and isolcpus, each dedicated to a VM with a vCPU pinned
> > to that physical core and configured with idle=poll within the guest.
> > Under such conditions, these vCPUs rarely leave the CPU. Combined with
> > the high core counts typical of modern server platforms, this results
> > in transition completion times that are not only excessively prolonged
> > but also highly unpredictable.
> >
> > This patch resolves this issue by registering a callback with
> > stop_machine. The callback attempts to transition the associated running
> > task. In a VM environment configured with 32 CPUs, the live patching
> > operation completes promptly after the SIGNALS_TIMEOUT period with this
> > patch applied; without it, the process nearly fails to complete under
> > the same scenario.
> >
> > Co-developed-by: Rui Qi <qirui.001@...edance.com>
> > Signed-off-by: Rui Qi <qirui.001@...edance.com>
> > Signed-off-by: Li Zhe <lizhe.67@...edance.com>
>
> PeterZ, what's your take on this?
>
> I wonder if we could instead do resched_cpu() or something similar to
> trigger the call to klp_sched_try_switch() in __schedule()?
klp_sched_try_switch() only invokes __klp_sched_try_switch() after
verifying that the corresponding task has the TASK_FREEZABLE flag
set. I remain uncertain whether this approach adequately resolves
the issue.
Thanks,
Zhe
Powered by blists - more mailing lists