linux-kernel - Re: linux-next: manual merge of the rcu tree with the ftrace tree

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20251114170232.wHJFS35i@linutronix.de>
Date: Fri, 14 Nov 2025 18:02:32 +0100
From: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Stephen Rothwell <sfr@...b.auug.org.au>,
	"Paul E. McKenney" <paulmck@...nel.org>,
	Frederic Weisbecker <frederic@...nel.org>,
	Neeraj Upadhyay <neeraj.upadhyay@...nel.org>,
	Boqun Feng <boqun.feng@...il.com>,
	Uladzislau Rezki <urezki@...il.com>,
	Masami Hiramatsu <mhiramat@...nel.org>,
	Linux Next Mailing List <linux-next@...r.kernel.org>
Subject: Re: linux-next: manual merge of the rcu tree with the ftrace tree

On 2025-11-14 11:48:28 [-0500], Steven Rostedt wrote:
> On Fri, 14 Nov 2025 17:33:30 +0100
> Sebastian Andrzej Siewior <bigeasy@...utronix.de> wrote:
> 
> > > Where in PREEMPT_RT we do not disable preemption around the tracepoint
> > > callback, but in non RT we do. Instead it uses a srcu and migrate disable.  
> > 
> > I appreciate the effort. I really do. But why can't we have SRCU on both
> > configs?
> 
> I don't know. Is there more overhead with disabling migration than
> disabling preemption?

On the first and last invocation, yes. But we if disabling migration is
not required for SRCU then why doing it?

We had the disabled preemption due to rcu_read_lock_sched() due to
tracepoint requirement which was not spelled out. This appears to be
replaced with srcu_fast(). I just don't see why we need two flavours
here (RT vs !RT) and where migrate_disable() requirement is from.

> > 
> > Also why does tracepoint_guard() need to disable migration? The BPF
> > program already disables migrations (see for instance
> > bpf_prog_run_array()).
> 
> We also would need to audit all tracepoint callbacks, as there may be some
> assumptions about staying on the same CPU.

Sure. Okay. What would I need to grep for in order to audit it?

> > This is true for RT and !RT. So there is no need to do it here.
> > 
> > > The migrate_disable in the syscall tracepoint (which gets called by the
> > > system call version that doesn't disable migration, even in RT), needs to
> > > disable migration so that the accounting that happens in:
> > > 
> > >   trace_event_buffer_reserve()
> > > 
> > > matches what happens when that function gets called by a normal tracepoint
> > > callback.  
> > 
> > buh. But this is something. If we know that the call chain does not
> > disable migration, couldn't we just use a different function? I mean we
> > have tracing_gen_ctx_dec() and tracing_gen_ctx)(). Wouldn't this work
> > for migrate_disable(), too? 
> > Just in case we need it and can not avoid it, see above.
> 
> I thought about that too. It would then create two different
> trace_event_buffer_reserve():
> 
> static __always_inline void *event_buffer_reserve(struct trace_event_buffer *fbuffer,
> 						  struct trace_event_file *trace_file,
> 						  unsigned long len, bool dec)
> {
> 	struct trace_event_call *event_call = trace_file->event_call;
> 
> 	if ((trace_file->flags & EVENT_FILE_FL_PID_FILTER) &&
> 	    trace_event_ignore_this_pid(trace_file))
> 		return NULL;
> 
> 	/*
> 	 * If CONFIG_PREEMPTION is enabled, then the tracepoint itself disables
> 	 * preemption (adding one to the preempt_count). Since we are
> 	 * interested in the preempt_count at the time the tracepoint was
> 	 * hit, we need to subtract one to offset the increment.
> 	 */
> 	fbuffer->trace_ctx = dec ? tracing_gen_ctx_dec() : tracing_gen_ctx();
> 	fbuffer->trace_file = trace_file;
> 
> 	fbuffer->event =
> 		trace_event_buffer_lock_reserve(&fbuffer->buffer, trace_file,
> 						event_call->event.type, len,
> 						fbuffer->trace_ctx);
> 	if (!fbuffer->event)
> 		return NULL;
> 
> 	fbuffer->regs = NULL;
> 	fbuffer->entry = ring_buffer_event_data(fbuffer->event);
> 	return fbuffer->entry;
> }
> 
> void *trace_event_buffer_reserve(struct trace_event_buffer *fbuffer,
> 				 struct trace_event_file *trace_file,
> 				 unsigned long len)
> {
> 	return event_buffer_reserve(fbuffer, trace_file, len, true);
> }
> 
> void *trace_syscall_event_buffer_reserve(struct trace_event_buffer *fbuffer,
> 					 struct trace_event_file *trace_file,
> 					 unsigned long len)
> {
> 	return event_buffer_reserve(fbuffer, trace_file, len, false);
> }
> 
> Hmm

Yeah. I *think* in the preempt case we always use the one or the other.

So I would prefer this instead of explicitly disable migration so the a
function down in the stack can decrement the counter again.
Ideally, we don't disable migration to begin with.

_If_ the BPF program disables migrations before invocation of its
program then any trace recording that happens within this program
_should_ record the migration counter at that time. Which would be 1 at
the minimum.

> -- Steve

Sebastian