lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251114112202.08e1e3c1@gandalf.local.home>
Date: Fri, 14 Nov 2025 11:22:02 -0500
From: Steven Rostedt <rostedt@...dmis.org>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Cc: Stephen Rothwell <sfr@...b.auug.org.au>, "Paul E. McKenney"
 <paulmck@...nel.org>, Frederic Weisbecker <frederic@...nel.org>, Neeraj
 Upadhyay <neeraj.upadhyay@...nel.org>, Boqun Feng <boqun.feng@...il.com>,
 Uladzislau Rezki <urezki@...il.com>, Masami Hiramatsu
 <mhiramat@...nel.org>, Linux Kernel Mailing List
 <linux-kernel@...r.kernel.org>, Linux Next Mailing List
 <linux-next@...r.kernel.org>
Subject: Re: linux-next: manual merge of the rcu tree with the ftrace tree

On Fri, 14 Nov 2025 17:00:17 +0100
Sebastian Andrzej Siewior <bigeasy@...utronix.de> wrote:

> > It doesn't disable preemption, but is here to keep the latency
> > preempt_count counting the same in both PREEMPT_RT and non PREEMPT_RT. You
> > know, the stuff that shows up in the trace:
> > 
> >   "d..4."  
> 
> urgh.
> 
> We did that to match the reality with the tracer. Since the tracer
> disabled preemption we decremented the counter from preempt_count to
> record what was there before the trace point started changing it.
> That was tracing_gen_ctx_dec(). Now I see we have something similar in
> tracing_gen_ctx_dec_cond().
> But why do we need to disable migration here? Why isn't !RT affected by
> this. I remember someone had a trace where the NMI was set and migrate
> disable was at max which sounds like someone decremented the
> migrate_disable counter while migration wasn't disabled…

It's to match this code:

--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -100,6 +100,25 @@ void for_each_tracepoint_in_module(struct module *mod,
 }
 #endif /* CONFIG_MODULES */
 
+/*
+ * BPF programs can attach to the tracepoint callbacks. But if the
+ * callbacks are called with preemption disabled, the BPF programs
+ * can cause quite a bit of latency. When PREEMPT_RT is enabled,
+ * instead of disabling preemption, use srcu_fast_notrace() for
+ * synchronization. As BPF programs that are attached to tracepoints
+ * expect to stay on the same CPU, also disable migration.
+ */
+#ifdef CONFIG_PREEMPT_RT
+extern struct srcu_struct tracepoint_srcu;
+# define tracepoint_sync() synchronize_srcu(&tracepoint_srcu);
+# define tracepoint_guard()                            \
+       guard(srcu_fast_notrace)(&tracepoint_srcu);     \
+       guard(migrate)()
+#else
+# define tracepoint_sync() synchronize_rcu();
+# define tracepoint_guard() guard(preempt_notrace)()
+#endif
+

Where in PREEMPT_RT we do not disable preemption around the tracepoint
callback, but in non RT we do. Instead it uses a srcu and migrate disable.

The migrate_disable in the syscall tracepoint (which gets called by the
system call version that doesn't disable migration, even in RT), needs to
disable migration so that the accounting that happens in:

  trace_event_buffer_reserve()

matches what happens when that function gets called by a normal tracepoint
callback.

-- Steve

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ