linux-kernel - Re: linux-next: manual merge of the rcu tree with the ftrace tree

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20251114174159.V60vTd1q@linutronix.de>
Date: Fri, 14 Nov 2025 18:41:59 +0100
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: "Paul E. McKenney" <paulmck@...nel.org>
Cc: Steven Rostedt <rostedt@...dmis.org>,
	Stephen Rothwell <sfr@...b.auug.org.au>,
	Frederic Weisbecker <frederic@...nel.org>,
	Neeraj Upadhyay <neeraj.upadhyay@...nel.org>,
	Boqun Feng <boqun.feng@...il.com>,
	Uladzislau Rezki <urezki@...il.com>,
	Masami Hiramatsu <mhiramat@...nel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Linux Next Mailing List <linux-next@...r.kernel.org>,
	yonghong.song@...ux.dev
Subject: Re: linux-next: manual merge of the rcu tree with the ftrace tree

On 2025-11-14 09:25:06 [-0800], Paul E. McKenney wrote:
> On Fri, Nov 14, 2025 at 06:10:52PM +0100, Sebastian Andrzej Siewior wrote:
> > On 2025-11-14 09:00:21 [-0800], Paul E. McKenney wrote:
> > > > > Where in PREEMPT_RT we do not disable preemption around the tracepoint
> > > > > callback, but in non RT we do. Instead it uses a srcu and migrate disable.
> > > > 
> > > > I appreciate the effort. I really do. But why can't we have SRCU on both
> > > > configs?
> > > 
> > > Due to performance concerns for non-RT kernels and workloads, where we
> > > really need preemption disabled.
> > 
> > This means srcu_read_lock_notrace() is much more overhead compared to
> > rcu_read_lock_sched_notrace()?
> > I am a bit afraid of different bugs here and there.
> 
> No, the concern is instead overhead due to any actual preemption.  So the
> goal is to actually disable preemption across the BPF program *except*
> in PREEMPT_RT kernels.

Overhead of actual preemption while the BPF callback of the trace-event
is invoked?
So we get rid of the preempt_disable() in the trace-point which we had
due rcu_read_lock_sched_notrace() and we need to preserve it because
preemption while the BPF program is invoked?
This is also something we want for CONFIG_PREEMPT (LAZY)?

Sorry to be verbose but I try to catch up.
The BPF invocation does not disable preemption for a long time. It
disables migration since some code uses per-CPU variables here.

For XDP kind of BPF invocations, preemption is disabled (except for RT)
because those run in NAPI/ softirq context.

> > > > Also why does tracepoint_guard() need to disable migration? The BPF
> > > > program already disables migrations (see for instance
> > > > bpf_prog_run_array()).
> > > > This is true for RT and !RT. So there is no need to do it here.
> > > 
> > > The addition of migration disabling was in response to failures, which
> > > this fixed.  Or at least greatly reduced the probability of!  Let's see...
> > > That migrate_disable() has been there since 2022, so the failures were
> > > happening despite it.  Adding Yonghong on CC for his perspective.
> > 
> > Okay. In general I would prefer that we know why we do it. BPF had
> > preempt_disable() which was turned into migrate_disable() for RT reasons
> > since remaining on the same CPU was enough and preempt_disable() was the
> > only way to enforce it at the time.
> > And I think Linus requested migrate_disable() to work regardless of RT
> > which PeterZ made happen (for different reasons, not BPF related).
> 
> Yes, migrate_disable() prevents migration either way, but it does not
> prevent preemption, which is what was needed in non-PREEMPT_RT kernels
> last I checked.

BPF in general sometimes relies on per-CPU variables. Sometimes it is
needed to avoid reentrancy which is what preempt_disable() provides for
the same context. This is usually handled where it is required and when
is removed, it is added back shortly. See for instance
	https://lore.kernel.org/all/20251114064922.11650-1-chandna.sahil@gmail.com/

:)

> 
> 							Thanx, Paul

Sebastian