lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <020d22f0-a95b-4204-a611-eb3953c33f32@efficios.com>
Date: Wed, 23 Jul 2025 17:40:20 -0400
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: "Paul E. McKenney" <paulmck@...nel.org>, rcu@...r.kernel.org
Cc: linux-kernel@...r.kernel.org, kernel-team@...a.com, rostedt@...dmis.org,
 Sebastian Andrzej Siewior <bigeasy@...utronix.de>, bpf@...r.kernel.org
Subject: Re: [PATCH v4 4/6] tracing: Guard __DECLARE_TRACE() use of
 __DO_TRACE_CALL() with SRCU-fast

On 2025-07-23 16:27, Paul E. McKenney wrote:
> The current use of guard(preempt_notrace)() within __DECLARE_TRACE()
> to protect invocation of __DO_TRACE_CALL() means that BPF programs
> attached to tracepoints are non-preemptible.  This is unhelpful in
> real-time systems, whose users apparently wish to use BPF while also
> achieving low latencies.  (Who knew?)
> 
> One option would be to use preemptible RCU, but this introduces
> many opportunities for infinite recursion, which many consider to
> be counterproductive, especially given the relatively small stacks
> provided by the Linux kernel.  These opportunities could be shut down
> by sufficiently energetic duplication of code, but this sort of thing
> is considered impolite in some circles.
> 
> Therefore, use the shiny new SRCU-fast API, which provides somewhat faster
> readers than those of preemptible RCU, at least on my laptop, where
> task_struct access is more expensive than access to per-CPU variables.
> And SRCU fast provides way faster readers than does SRCU, courtesy of
> being able to avoid the read-side use of smp_mb().  Also, it is quite
> straightforward to create srcu_read_{,un}lock_fast_notrace() functions.

As-is this will break the tracer callbacks, because some tracers expect
the tracepoint callback to be called with preemption-off for various
reasons, including preventing migration.

We'd need to add preempt off guards in the tracer callbacks that require
it initially before doing this change.

I've done something similar for the syscall tracepoints when introducing
faultable syscall tracepoints:

4aadde89d81f tracing/bpf: disable preemption in syscall probe
65e7462a16ce tracing/perf: disable preemption in syscall probe
13d750c2c03e tracing/ftrace: disable preemption in syscall probe

Thanks,

Mathieu

> 
> While in the area, SRCU now supports early boot call_srcu().  Therefore,
> remove the checks that used to avoid such use from rcu_free_old_probes()
> before this commit was applied:
> 
> e53244e2c893 ("tracepoint: Remove SRCU protection")
> 
> The current commit can be thought of as an approximate revert of that
> commit.
> 
> Link: https://lore.kernel.org/all/20250613152218.1924093-1-bigeasy@linutronix.de/
> Signed-off-by: Paul E. McKenney <paulmck@...nel.org>
> Cc: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
> Cc: Steven Rostedt <rostedt@...dmis.org>
> Cc: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
> Cc: <bpf@...r.kernel.org>
> ---
>   include/linux/tracepoint.h |  6 ++++--
>   kernel/tracepoint.c        | 21 ++++++++++++++++++++-
>   2 files changed, 24 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index 826ce3f8e1f85..a22c1ab88560b 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -33,6 +33,8 @@ struct trace_eval_map {
>   
>   #define TRACEPOINT_DEFAULT_PRIO	10
>   
> +extern struct srcu_struct tracepoint_srcu;
> +
>   extern int
>   tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data);
>   extern int
> @@ -115,7 +117,7 @@ void for_each_tracepoint_in_module(struct module *mod,
>   static inline void tracepoint_synchronize_unregister(void)
>   {
>   	synchronize_rcu_tasks_trace();
> -	synchronize_rcu();
> +	synchronize_srcu(&tracepoint_srcu);
>   }
>   static inline bool tracepoint_is_faultable(struct tracepoint *tp)
>   {
> @@ -271,7 +273,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
>   	static inline void __do_trace_##name(proto)			\
>   	{								\
>   		if (cond) {						\
> -			guard(preempt_notrace)();			\
> +			guard(srcu_fast_notrace)(&tracepoint_srcu);	\
>   			__DO_TRACE_CALL(name, TP_ARGS(args));		\
>   		}							\
>   	}								\
> diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> index 62719d2941c90..e19973015cbd7 100644
> --- a/kernel/tracepoint.c
> +++ b/kernel/tracepoint.c
> @@ -25,6 +25,9 @@ enum tp_func_state {
>   extern tracepoint_ptr_t __start___tracepoints_ptrs[];
>   extern tracepoint_ptr_t __stop___tracepoints_ptrs[];
>   
> +DEFINE_SRCU(tracepoint_srcu);
> +EXPORT_SYMBOL_GPL(tracepoint_srcu);
> +
>   enum tp_transition_sync {
>   	TP_TRANSITION_SYNC_1_0_1,
>   	TP_TRANSITION_SYNC_N_2_1,
> @@ -34,6 +37,7 @@ enum tp_transition_sync {
>   
>   struct tp_transition_snapshot {
>   	unsigned long rcu;
> +	unsigned long srcu_gp;
>   	bool ongoing;
>   };
>   
> @@ -46,6 +50,7 @@ static void tp_rcu_get_state(enum tp_transition_sync sync)
>   
>   	/* Keep the latest get_state snapshot. */
>   	snapshot->rcu = get_state_synchronize_rcu();
> +	snapshot->srcu_gp = start_poll_synchronize_srcu(&tracepoint_srcu);
>   	snapshot->ongoing = true;
>   }
>   
> @@ -56,6 +61,8 @@ static void tp_rcu_cond_sync(enum tp_transition_sync sync)
>   	if (!snapshot->ongoing)
>   		return;
>   	cond_synchronize_rcu(snapshot->rcu);
> +	if (!poll_state_synchronize_srcu(&tracepoint_srcu, snapshot->srcu_gp))
> +		synchronize_srcu(&tracepoint_srcu);
>   	snapshot->ongoing = false;
>   }
>   
> @@ -101,17 +108,29 @@ static inline void *allocate_probes(int count)
>   	return p == NULL ? NULL : p->probes;
>   }
>   
> -static void rcu_free_old_probes(struct rcu_head *head)
> +static void srcu_free_old_probes(struct rcu_head *head)
>   {
>   	kfree(container_of(head, struct tp_probes, rcu));
>   }
>   
> +static void rcu_free_old_probes(struct rcu_head *head)
> +{
> +	call_srcu(&tracepoint_srcu, head, srcu_free_old_probes);
> +}
> +
>   static inline void release_probes(struct tracepoint *tp, struct tracepoint_func *old)
>   {
>   	if (old) {
>   		struct tp_probes *tp_probes = container_of(old,
>   			struct tp_probes, probes[0]);
>   
> +		/*
> +		 * Tracepoint probes are protected by either RCU or
> +		 * Tasks Trace RCU and also by SRCU.  By calling the SRCU
> +		 * callback in the [Tasks Trace] RCU callback we cover
> +		 * both cases. So let us chain the SRCU and [Tasks Trace]
> +		 * RCU callbacks to wait for both grace periods.
> +		 */
>   		if (tracepoint_is_faultable(tp))
>   			call_rcu_tasks_trace(&tp_probes->rcu, rcu_free_old_probes);
>   		else


-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ