[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251114112202.08e1e3c1@gandalf.local.home>
Date: Fri, 14 Nov 2025 11:22:02 -0500
From: Steven Rostedt <rostedt@...dmis.org>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc: Stephen Rothwell <sfr@...b.auug.org.au>, "Paul E. McKenney"
<paulmck@...nel.org>, Frederic Weisbecker <frederic@...nel.org>, Neeraj
Upadhyay <neeraj.upadhyay@...nel.org>, Boqun Feng <boqun.feng@...il.com>,
Uladzislau Rezki <urezki@...il.com>, Masami Hiramatsu
<mhiramat@...nel.org>, Linux Kernel Mailing List
<linux-kernel@...r.kernel.org>, Linux Next Mailing List
<linux-next@...r.kernel.org>
Subject: Re: linux-next: manual merge of the rcu tree with the ftrace tree
On Fri, 14 Nov 2025 17:00:17 +0100
Sebastian Andrzej Siewior <bigeasy@...utronix.de> wrote:
> > It doesn't disable preemption, but is here to keep the latency
> > preempt_count counting the same in both PREEMPT_RT and non PREEMPT_RT. You
> > know, the stuff that shows up in the trace:
> >
> > "d..4."
>
> urgh.
>
> We did that to match the reality with the tracer. Since the tracer
> disabled preemption we decremented the counter from preempt_count to
> record what was there before the trace point started changing it.
> That was tracing_gen_ctx_dec(). Now I see we have something similar in
> tracing_gen_ctx_dec_cond().
> But why do we need to disable migration here? Why isn't !RT affected by
> this. I remember someone had a trace where the NMI was set and migrate
> disable was at max which sounds like someone decremented the
> migrate_disable counter while migration wasn't disabled…
It's to match this code:
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -100,6 +100,25 @@ void for_each_tracepoint_in_module(struct module *mod,
}
#endif /* CONFIG_MODULES */
+/*
+ * BPF programs can attach to the tracepoint callbacks. But if the
+ * callbacks are called with preemption disabled, the BPF programs
+ * can cause quite a bit of latency. When PREEMPT_RT is enabled,
+ * instead of disabling preemption, use srcu_fast_notrace() for
+ * synchronization. As BPF programs that are attached to tracepoints
+ * expect to stay on the same CPU, also disable migration.
+ */
+#ifdef CONFIG_PREEMPT_RT
+extern struct srcu_struct tracepoint_srcu;
+# define tracepoint_sync() synchronize_srcu(&tracepoint_srcu);
+# define tracepoint_guard() \
+ guard(srcu_fast_notrace)(&tracepoint_srcu); \
+ guard(migrate)()
+#else
+# define tracepoint_sync() synchronize_rcu();
+# define tracepoint_guard() guard(preempt_notrace)()
+#endif
+
Where in PREEMPT_RT we do not disable preemption around the tracepoint
callback, but in non RT we do. Instead it uses a srcu and migrate disable.
The migrate_disable in the syscall tracepoint (which gets called by the
system call version that doesn't disable migration, even in RT), needs to
disable migration so that the accounting that happens in:
trace_event_buffer_reserve()
matches what happens when that function gets called by a normal tracepoint
callback.
-- Steve
Powered by blists - more mailing lists