linux-kernel - Re: linux-next: manual merge of the rcu tree with the ftrace tree

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20251114114828.5b7d4fe8@gandalf.local.home>
Date: Fri, 14 Nov 2025 11:48:28 -0500
From: Steven Rostedt <rostedt@...dmis.org>
To: Sebastian Andrzej Siewior <bigeasy@...utronix.de>, Linux Kernel Mailing
 List <linux-kernel@...r.kernel.org>
Cc: Stephen Rothwell <sfr@...b.auug.org.au>, "Paul E. McKenney"
 <paulmck@...nel.org>, Frederic Weisbecker <frederic@...nel.org>, Neeraj
 Upadhyay <neeraj.upadhyay@...nel.org>, Boqun Feng <boqun.feng@...il.com>,
 Uladzislau Rezki <urezki@...il.com>, Masami Hiramatsu
 <mhiramat@...nel.org>, Linux Next Mailing List <linux-next@...r.kernel.org>
Subject: Re: linux-next: manual merge of the rcu tree with the ftrace tree

On Fri, 14 Nov 2025 17:33:30 +0100
Sebastian Andrzej Siewior <bigeasy@...utronix.de> wrote:

> > Where in PREEMPT_RT we do not disable preemption around the tracepoint
> > callback, but in non RT we do. Instead it uses a srcu and migrate disable.  
> 
> I appreciate the effort. I really do. But why can't we have SRCU on both
> configs?

I don't know. Is there more overhead with disabling migration than
disabling preemption?

> 
> Also why does tracepoint_guard() need to disable migration? The BPF
> program already disables migrations (see for instance
> bpf_prog_run_array()).

We also would need to audit all tracepoint callbacks, as there may be some
assumptions about staying on the same CPU.

> This is true for RT and !RT. So there is no need to do it here.
> 
> > The migrate_disable in the syscall tracepoint (which gets called by the
> > system call version that doesn't disable migration, even in RT), needs to
> > disable migration so that the accounting that happens in:
> > 
> >   trace_event_buffer_reserve()
> > 
> > matches what happens when that function gets called by a normal tracepoint
> > callback.  
> 
> buh. But this is something. If we know that the call chain does not
> disable migration, couldn't we just use a different function? I mean we
> have tracing_gen_ctx_dec() and tracing_gen_ctx)(). Wouldn't this work
> for migrate_disable(), too? 
> Just in case we need it and can not avoid it, see above.

I thought about that too. It would then create two different
trace_event_buffer_reserve():

static __always_inline void *event_buffer_reserve(struct trace_event_buffer *fbuffer,
						  struct trace_event_file *trace_file,
						  unsigned long len, bool dec)
{
	struct trace_event_call *event_call = trace_file->event_call;

	if ((trace_file->flags & EVENT_FILE_FL_PID_FILTER) &&
	    trace_event_ignore_this_pid(trace_file))
		return NULL;

	/*
	 * If CONFIG_PREEMPTION is enabled, then the tracepoint itself disables
	 * preemption (adding one to the preempt_count). Since we are
	 * interested in the preempt_count at the time the tracepoint was
	 * hit, we need to subtract one to offset the increment.
	 */
	fbuffer->trace_ctx = dec ? tracing_gen_ctx_dec() : tracing_gen_ctx();
	fbuffer->trace_file = trace_file;

	fbuffer->event =
		trace_event_buffer_lock_reserve(&fbuffer->buffer, trace_file,
						event_call->event.type, len,
						fbuffer->trace_ctx);
	if (!fbuffer->event)
		return NULL;

	fbuffer->regs = NULL;
	fbuffer->entry = ring_buffer_event_data(fbuffer->event);
	return fbuffer->entry;
}

void *trace_event_buffer_reserve(struct trace_event_buffer *fbuffer,
				 struct trace_event_file *trace_file,
				 unsigned long len)
{
	return event_buffer_reserve(fbuffer, trace_file, len, true);
}

void *trace_syscall_event_buffer_reserve(struct trace_event_buffer *fbuffer,
					 struct trace_event_file *trace_file,
					 unsigned long len)
{
	return event_buffer_reserve(fbuffer, trace_file, len, false);
}

Hmm

-- Steve