[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251114174159.V60vTd1q@linutronix.de>
Date: Fri, 14 Nov 2025 18:41:59 +0100
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: "Paul E. McKenney" <paulmck@...nel.org>
Cc: Steven Rostedt <rostedt@...dmis.org>,
Stephen Rothwell <sfr@...b.auug.org.au>,
Frederic Weisbecker <frederic@...nel.org>,
Neeraj Upadhyay <neeraj.upadhyay@...nel.org>,
Boqun Feng <boqun.feng@...il.com>,
Uladzislau Rezki <urezki@...il.com>,
Masami Hiramatsu <mhiramat@...nel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Linux Next Mailing List <linux-next@...r.kernel.org>,
yonghong.song@...ux.dev
Subject: Re: linux-next: manual merge of the rcu tree with the ftrace tree
On 2025-11-14 09:25:06 [-0800], Paul E. McKenney wrote:
> On Fri, Nov 14, 2025 at 06:10:52PM +0100, Sebastian Andrzej Siewior wrote:
> > On 2025-11-14 09:00:21 [-0800], Paul E. McKenney wrote:
> > > > > Where in PREEMPT_RT we do not disable preemption around the tracepoint
> > > > > callback, but in non RT we do. Instead it uses a srcu and migrate disable.
> > > >
> > > > I appreciate the effort. I really do. But why can't we have SRCU on both
> > > > configs?
> > >
> > > Due to performance concerns for non-RT kernels and workloads, where we
> > > really need preemption disabled.
> >
> > This means srcu_read_lock_notrace() is much more overhead compared to
> > rcu_read_lock_sched_notrace()?
> > I am a bit afraid of different bugs here and there.
>
> No, the concern is instead overhead due to any actual preemption. So the
> goal is to actually disable preemption across the BPF program *except*
> in PREEMPT_RT kernels.
Overhead of actual preemption while the BPF callback of the trace-event
is invoked?
So we get rid of the preempt_disable() in the trace-point which we had
due rcu_read_lock_sched_notrace() and we need to preserve it because
preemption while the BPF program is invoked?
This is also something we want for CONFIG_PREEMPT (LAZY)?
Sorry to be verbose but I try to catch up.
The BPF invocation does not disable preemption for a long time. It
disables migration since some code uses per-CPU variables here.
For XDP kind of BPF invocations, preemption is disabled (except for RT)
because those run in NAPI/ softirq context.
> > > > Also why does tracepoint_guard() need to disable migration? The BPF
> > > > program already disables migrations (see for instance
> > > > bpf_prog_run_array()).
> > > > This is true for RT and !RT. So there is no need to do it here.
> > >
> > > The addition of migration disabling was in response to failures, which
> > > this fixed. Or at least greatly reduced the probability of! Let's see...
> > > That migrate_disable() has been there since 2022, so the failures were
> > > happening despite it. Adding Yonghong on CC for his perspective.
> >
> > Okay. In general I would prefer that we know why we do it. BPF had
> > preempt_disable() which was turned into migrate_disable() for RT reasons
> > since remaining on the same CPU was enough and preempt_disable() was the
> > only way to enforce it at the time.
> > And I think Linus requested migrate_disable() to work regardless of RT
> > which PeterZ made happen (for different reasons, not BPF related).
>
> Yes, migrate_disable() prevents migration either way, but it does not
> prevent preemption, which is what was needed in non-PREEMPT_RT kernels
> last I checked.
BPF in general sometimes relies on per-CPU variables. Sometimes it is
needed to avoid reentrancy which is what preempt_disable() provides for
the same context. This is usually handled where it is required and when
is removed, it is added back shortly. See for instance
https://lore.kernel.org/all/20251114064922.11650-1-chandna.sahil@gmail.com/
:)
>
> Thanx, Paul
Sebastian
Powered by blists - more mailing lists