linux-kernel - Re: [PATCH] klp: use stop machine to check and expedite transition for running tasks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <20260204024756.6776-1-lizhe.67@bytedance.com>
Date: Wed,  4 Feb 2026 10:47:56 +0800
From: "Li Zhe" <lizhe.67@...edance.com>
To: <jpoimboe@...nel.org>
Cc: <jikos@...nel.org>, <joe.lawrence@...hat.com>, 
	<linux-kernel@...r.kernel.org>, <live-patching@...r.kernel.org>, 
	<lizhe.67@...edance.com>, <mbenes@...e.cz>, <peterz@...radead.org>, 
	<pmladek@...e.com>, <qirui.001@...edance.com>
Subject: Re: [PATCH] klp: use stop machine to check and expedite transition for running tasks

On Tue, 3 Feb 2026 18:20:22 -0800, jpoimboe@...nel.org wrote:
 
> On Mon, Feb 02, 2026 at 05:13:34PM +0800, Li Zhe wrote:
> > In the current KLP transition implementation, the strategy for running
> > tasks relies on waiting for a context switch to attempt to clear the
> > TIF_PATCH_PENDING flag. Alternatively, determine whether the
> > TIF_PATCH_PENDING flag can be cleared by inspecting the stack once the
> > process has yielded the CPU. However, this approach proves problematic
> > in certain environments.
> > 
> > Consider a scenario where the majority of system CPUs are configured
> > with nohzfull and isolcpus, each dedicated to a VM with a vCPU pinned
> > to that physical core and configured with idle=poll within the guest.
> > Under such conditions, these vCPUs rarely leave the CPU. Combined with
> > the high core counts typical of modern server platforms, this results
> > in transition completion times that are not only excessively prolonged
> > but also highly unpredictable.
> > 
> > This patch resolves this issue by registering a callback with
> > stop_machine. The callback attempts to transition the associated running
> > task. In a VM environment configured with 32 CPUs, the live patching
> > operation completes promptly after the SIGNALS_TIMEOUT period with this
> > patch applied; without it, the process nearly fails to complete under
> > the same scenario.
> > 
> > Co-developed-by: Rui Qi <qirui.001@...edance.com>
> > Signed-off-by: Rui Qi <qirui.001@...edance.com>
> > Signed-off-by: Li Zhe <lizhe.67@...edance.com>
> 
> PeterZ, what's your take on this?
> 
> I wonder if we could instead do resched_cpu() or something similar to
> trigger the call to klp_sched_try_switch() in __schedule()?

klp_sched_try_switch() only invokes __klp_sched_try_switch() after
verifying that the corresponding task has the TASK_FREEZABLE flag
set. I remain uncertain whether this approach adequately resolves
the issue.

Thanks,
Zhe