[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3af3a952-43f9-4625-b87c-f45d14d8228e@paulmck-laptop>
Date: Fri, 14 Nov 2025 10:26:37 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc: Steven Rostedt <rostedt@...dmis.org>,
Stephen Rothwell <sfr@...b.auug.org.au>,
Frederic Weisbecker <frederic@...nel.org>,
Neeraj Upadhyay <neeraj.upadhyay@...nel.org>,
Boqun Feng <boqun.feng@...il.com>,
Uladzislau Rezki <urezki@...il.com>,
Masami Hiramatsu <mhiramat@...nel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Linux Next Mailing List <linux-next@...r.kernel.org>,
yonghong.song@...ux.dev
Subject: Re: linux-next: manual merge of the rcu tree with the ftrace tree
On Fri, Nov 14, 2025 at 06:41:59PM +0100, Sebastian Andrzej Siewior wrote:
> On 2025-11-14 09:25:06 [-0800], Paul E. McKenney wrote:
> > On Fri, Nov 14, 2025 at 06:10:52PM +0100, Sebastian Andrzej Siewior wrote:
> > > On 2025-11-14 09:00:21 [-0800], Paul E. McKenney wrote:
> > > > > > Where in PREEMPT_RT we do not disable preemption around the tracepoint
> > > > > > callback, but in non RT we do. Instead it uses a srcu and migrate disable.
> > > > >
> > > > > I appreciate the effort. I really do. But why can't we have SRCU on both
> > > > > configs?
> > > >
> > > > Due to performance concerns for non-RT kernels and workloads, where we
> > > > really need preemption disabled.
> > >
> > > This means srcu_read_lock_notrace() is much more overhead compared to
> > > rcu_read_lock_sched_notrace()?
> > > I am a bit afraid of different bugs here and there.
> >
> > No, the concern is instead overhead due to any actual preemption. So the
> > goal is to actually disable preemption across the BPF program *except*
> > in PREEMPT_RT kernels.
>
> Overhead of actual preemption while the BPF callback of the trace-event
> is invoked?
> So we get rid of the preempt_disable() in the trace-point which we had
> due rcu_read_lock_sched_notrace() and we need to preserve it because
> preemption while the BPF program is invoked?
> This is also something we want for CONFIG_PREEMPT (LAZY)?
>
> Sorry to be verbose but I try to catch up.
No need to apologize, given my tendency to be verbose. ;-)
> The BPF invocation does not disable preemption for a long time. It
> disables migration since some code uses per-CPU variables here.
>
> For XDP kind of BPF invocations, preemption is disabled (except for RT)
> because those run in NAPI/ softirq context.
Before Steven's pair of patches (one of which Frederic and I are handling
due to it depending on not-yet-mainline SRCU-fast commits), BPF programs
attached to tracepoints ran with preemption disabled. This behavior is
still in mainline. As you reported some time back, this caused problems
for PREEMPT_RT, hence Steven's pair of patches. But although we do want
to fix PREEMPT_RT, we don't want to break other kernel configuration,
hence keeping preemption disabled in non-PREEMPT_RT kernels.
Now perhaps Yonghong will tell us that this has since been shown to not
be a problem for BPF programs attached to tracepoints in non-PREEMPT_RT
kernels. But he has not yet done so, which strongly suggests we keep
the known-to-work preemption-disabled status of BPF programs attached
to tracepoints.
> > > > > Also why does tracepoint_guard() need to disable migration? The BPF
> > > > > program already disables migrations (see for instance
> > > > > bpf_prog_run_array()).
> > > > > This is true for RT and !RT. So there is no need to do it here.
> > > >
> > > > The addition of migration disabling was in response to failures, which
> > > > this fixed. Or at least greatly reduced the probability of! Let's see...
> > > > That migrate_disable() has been there since 2022, so the failures were
> > > > happening despite it. Adding Yonghong on CC for his perspective.
> > >
> > > Okay. In general I would prefer that we know why we do it. BPF had
> > > preempt_disable() which was turned into migrate_disable() for RT reasons
> > > since remaining on the same CPU was enough and preempt_disable() was the
> > > only way to enforce it at the time.
> > > And I think Linus requested migrate_disable() to work regardless of RT
> > > which PeterZ made happen (for different reasons, not BPF related).
> >
> > Yes, migrate_disable() prevents migration either way, but it does not
> > prevent preemption, which is what was needed in non-PREEMPT_RT kernels
> > last I checked.
>
> BPF in general sometimes relies on per-CPU variables. Sometimes it is
> needed to avoid reentrancy which is what preempt_disable() provides for
> the same context. This is usually handled where it is required and when
> is removed, it is added back shortly. See for instance
> https://lore.kernel.org/all/20251114064922.11650-1-chandna.sahil@gmail.com/
>
> :)
Agreed, and that was why I added the migrate_disable() calls earlier,
calls that in Steven's more recent version of this patch just now
conflicted with Steven's other patch in -next. ;-)
Thanx, Paul
Powered by blists - more mailing lists