linux-kernel - Re: [PATCH v6 23/29] context-tracking: Introduce work deferral infrastructure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aQIqSq2FWyHg0Y3p@localhost.localdomain>
Date: Wed, 29 Oct 2025 15:52:58 +0100
From: Frederic Weisbecker <frederic@...nel.org>
To: Valentin Schneider <vschneid@...hat.com>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org, rcu@...r.kernel.org,
	x86@...nel.org, linux-arm-kernel@...ts.infradead.org,
	loongarch@...ts.linux.dev, linux-riscv@...ts.infradead.org,
	linux-arch@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
	Nicolas Saenz Julienne <nsaenzju@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
	Dave Hansen <dave.hansen@...ux.intel.com>,
	"H. Peter Anvin" <hpa@...or.com>, Andy Lutomirski <luto@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Arnaldo Carvalho de Melo <acme@...nel.org>,
	Josh Poimboeuf <jpoimboe@...nel.org>,
	Paolo Bonzini <pbonzini@...hat.com>, Arnd Bergmann <arnd@...db.de>,
	"Paul E. McKenney" <paulmck@...nel.org>,
	Jason Baron <jbaron@...mai.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ard Biesheuvel <ardb@...nel.org>,
	Sami Tolvanen <samitolvanen@...gle.com>,
	"David S. Miller" <davem@...emloft.net>,
	Neeraj Upadhyay <neeraj.upadhyay@...nel.org>,
	Joel Fernandes <joelagnelf@...dia.com>,
	Josh Triplett <josh@...htriplett.org>,
	Boqun Feng <boqun.feng@...il.com>,
	Uladzislau Rezki <urezki@...il.com>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	Mel Gorman <mgorman@...e.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Masahiro Yamada <masahiroy@...nel.org>,
	Han Shen <shenhan@...gle.com>, Rik van Riel <riel@...riel.com>,
	Jann Horn <jannh@...gle.com>,
	Dan Carpenter <dan.carpenter@...aro.org>,
	Oleg Nesterov <oleg@...hat.com>, Juri Lelli <juri.lelli@...hat.com>,
	Clark Williams <williams@...hat.com>,
	Yair Podemsky <ypodemsk@...hat.com>,
	Marcelo Tosatti <mtosatti@...hat.com>,
	Daniel Wagner <dwagner@...e.de>, Petr Tesarik <ptesarik@...e.com>
Subject: Re: [PATCH v6 23/29] context-tracking: Introduce work deferral
 infrastructure

Le Wed, Oct 29, 2025 at 11:09:50AM +0100, Valentin Schneider a écrit :
> On 28/10/25 15:00, Frederic Weisbecker wrote:
> > Le Fri, Oct 10, 2025 at 05:38:33PM +0200, Valentin Schneider a écrit :
> >> +	old = atomic_read(&ct->state);
> >> +
> >> +	/*
> >> +	 * The work bit must only be set if the target CPU is not executing
> >> +	 * in kernelspace.
> >> +	 * CT_RCU_WATCHING is used as a proxy for that - if the bit is set, we
> >> +	 * know for sure the CPU is executing in the kernel whether that be in
> >> +	 * NMI, IRQ or process context.
> >> +	 * Set CT_RCU_WATCHING here and let the cmpxchg do the check for us;
> >> +	 * the state could change between the atomic_read() and the cmpxchg().
> >> +	 */
> >> +	old |= CT_RCU_WATCHING;
> >
> > Most of the time, the task should be either idle or in userspace. I'm still not
> > sure why you start with a bet that the CPU is in the kernel with RCU watching.
> >
> 
> Right I think I got that the wrong way around when I switched to using
> CT_RCU_WATCHING vs CT_STATE_KERNEL. That wants to be
> 
>   old &= ~CT_RCU_WATCHING;
> 
> i.e. bet the CPU is NOHZ-idle, if it's not the cmpxchg fails and we don't
> store the work bit.

Right.

> 
> >> +	/*
> >> +	 * Try setting the work until either
> >> +	 * - the target CPU has entered kernelspace
> >> +	 * - the work has been set
> >> +	 */
> >> +	do {
> >> +		ret = atomic_try_cmpxchg(&ct->state, &old, old | (work << CT_WORK_START));
> >> +	} while (!ret && !(old & CT_RCU_WATCHING));
> >
> > So this applies blindly to idle as well, right? It should work but note that
> > idle entry code before RCU watches is also fragile.
> >
> 
> Yeah I remember losing some hair trying to grok the idle entry situation;
> we could keep this purely NOHZ_FULL and have the deferral condition be:
> 
>   (ct->state & CT_STATE_USER) && !(ct->state & CT_RCU_WATCHING)

Well, after all what works for NOHZ_FULL should also work for idle. It's
preceded by entry code as well (or rather __cpuidle).

Thanks.

-- 
Frederic Weisbecker
SUSE Labs