[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260112085257.26bb7b5b@fedora>
Date: Mon, 12 Jan 2026 08:53:19 -0500
From: Steven Rostedt <rostedt@...dmis.org>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, LKML
<linux-kernel@...r.kernel.org>, Linux trace kernel
<linux-trace-kernel@...r.kernel.org>, bpf <bpf@...r.kernel.org>, Masami
Hiramatsu <mhiramat@...nel.org>, "Paul E. McKenney" <paulmck@...nel.org>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>, Thomas Gleixner
<tglx@...utronix.de>
Subject: Re: [PATCH v5] tracing: Guard __DECLARE_TRACE() use of
__DO_TRACE_CALL() with SRCU-fast
On Sun, 11 Jan 2026 15:38:38 -0800
Alexei Starovoitov <alexei.starovoitov@...il.com> wrote:
> > Oh, so you are OK replacing the preempt_disable in the tracepoint
> > callbacks with fast SRCU?
>
> yes, but..
>
> > Then I guess we can simply do that. Would it be fine to do that for
> > both RT and non-RT? That will simplify the code quite a bit.
>
> Agree. perf needs preempt_disable in their callbacks (as this patch does)
> and bpf side needs to add migrate_disable in __bpf_trace_run for now.
> Though syscall tracepoints are sleepable we don't take advantage of
> that on the bpf side. Eventually we will, and then rcu_lock
> inside __bpf_trace_run will become srcu_fast_lock.
>
> The way to think about generic infrastructure like tracepoints is
> to minimize their overhead no matter what out-of-tree and in-tree
> users' assumptions are today, so why do we need preempt_disable
> or srcu_fast there?
Either preempt disable or srcu_fast is still needed.
> I think today it's there because all callbacks (perf, ftrace, bpf)
> expect preemption to be disabled, but can we just remove it from tp side?
> and move preempt_disable to callbacks that actually need it?
Yes if we are talking about switching from preempt_disable to src_fast.
No if you mean to remove both as it still needs RCU protection. It's
used for synchronizing changes in the tracepoint infrastructure itself.
That __DO_TRACE_CALL() macro is the guts of the tracepoint callback
code. It needs to handle races with additions and removals of callbacks
there, as the callbacks also get data passed to them. If it gets out of
sync, a callback could be called with another callback's data.
That's why it has:
it_func_ptr = \
rcu_dereference_raw((&__tracepoint_##name)->funcs);
>
> I'm looking at release_probes(). It's fine as-is, no?
That's just freeing, and as you see, there's RCU synchronization
required.
Updates to tracepoints require RCU protection. It started out with
preempt_disable() for all tracepoints (which was synchronized with
synchronized_sched() which later became just synchronize_rcu()).
The issue that came about was that both ftrace and perf had an
assumption that its tracepoint callbacks always have preempt disabled
when being called. To move to srcu_fast() that is no longer the case.
And we need to do that for PREEMPT_RT if BPF is running very long
callbacks to the tracepoints.
Ftrace has been fixed to not require it, but still needs to take into
account if tracepoints disable preemption or not so that it can display
the preempt_count() of when the tracepoint was called correctly.
Perf is trivial to fix as it can simply add a preempt_disable() in its
handler.
But we were originally told that BPF had an issue because it had the
assumption that tracepoint callbacks wouldn't migrate. That's where
this patch came about.
Now if you are saying that BPF will handle migrate_disable() on its own
and not require the tracepoint infrastructure to do it for it, then
this is perfect. And I can then simplify this code, and just use
srcu_fast for both RT and !RT.
-- Steve
Powered by blists - more mailing lists