linux-kernel - Re: [PATCH v3] tracing: Guard __DECLARE_TRACE() use of __DO_TRACE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <39252902-567b-4e74-b6c4-91eae1df7c0d@paulmck-laptop>
Date: Fri, 12 Dec 2025 16:06:09 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Joel Fernandes <joelagnelf@...dia.com>
Cc: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Steve Rostedt <rostedt@...dmis.org>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
	"bpf@...r.kernel.org" <bpf@...r.kernel.org>
Subject: Re: [PATCH v3] tracing: Guard __DECLARE_TRACE() use of
 __DO_TRACE_CALL() with SRCU-fast

On Fri, Dec 12, 2025 at 11:54:28PM +0000, Joel Fernandes wrote:
> 
> 
> > On Dec 13, 2025, at 8:10 AM, Paul E. McKenney <paulmck@...nel.org> wrote:
> > 
> > On Fri, Dec 12, 2025 at 09:28:37AM +0000, Joel Fernandes wrote:
> >> 
> >> 
> >>>> On Dec 12, 2025, at 4:50 PM, Paul E. McKenney <paulmck@...nel.org> wrote:
> >>> 
> >>> On Fri, Dec 12, 2025 at 03:43:07AM +0000, Joel Fernandes wrote:
> >>>> 
> >>>> 
> >>>>>> On Dec 12, 2025, at 9:47 AM, Paul E. McKenney <paulmck@...nel.org> wrote:
> >>>>> 
> >>>>> On Fri, Dec 12, 2025 at 09:12:07AM +0900, Joel Fernandes wrote:
> >>>>>> 
> >>>>>> 
> >>>>>>> On 12/11/2025 3:23 PM, Paul E. McKenney wrote:
> >>>>>>> On Thu, Dec 11, 2025 at 08:02:15PM +0000, Joel Fernandes wrote:
> >>>>>>>> 
> >>>>>>>> 
> >>>>>>>>> On Dec 8, 2025, at 1:20 PM, Paul E. McKenney <paulmck@...nel.org> wrote:
> >>>>>>>>> 
> >>>>>>>>> The current use of guard(preempt_notrace)() within __DECLARE_TRACE()
> >>>>>>>>> to protect invocation of __DO_TRACE_CALL() means that BPF programs
> >>>>>>>>> attached to tracepoints are non-preemptible.  This is unhelpful in
> >>>>>>>>> real-time systems, whose users apparently wish to use BPF while also
> >>>>>>>>> achieving low latencies.  (Who knew?)
> >>>>>>>>> 
> >>>>>>>>> One option would be to use preemptible RCU, but this introduces
> >>>>>>>>> many opportunities for infinite recursion, which many consider to
> >>>>>>>>> be counterproductive, especially given the relatively small stacks
> >>>>>>>>> provided by the Linux kernel.  These opportunities could be shut down
> >>>>>>>>> by sufficiently energetic duplication of code, but this sort of thing
> >>>>>>>>> is considered impolite in some circles.
> >>>>>>>>> 
> >>>>>>>>> Therefore, use the shiny new SRCU-fast API, which provides somewhat faster
> >>>>>>>>> readers than those of preemptible RCU, at least on Paul E. McKenney's
> >>>>>>>>> laptop, where task_struct access is more expensive than access to per-CPU
> >>>>>>>>> variables.  And SRCU-fast provides way faster readers than does SRCU,
> >>>>>>>>> courtesy of being able to avoid the read-side use of smp_mb().  Also,
> >>>>>>>>> it is quite straightforward to create srcu_read_{,un}lock_fast_notrace()
> >>>>>>>>> functions.
> >>>>>>>>> 
> >>>>>>>>> While in the area, SRCU now supports early boot call_srcu().  Therefore,
> >>>>>>>>> remove the checks that used to avoid such use from rcu_free_old_probes()
> >>>>>>>>> before this commit was applied:
> >>>>>>>>> 
> >>>>>>>>> e53244e2c893 ("tracepoint: Remove SRCU protection")
> >>>>>>>>> 
> >>>>>>>>> The current commit can be thought of as an approximate revert of that
> >>>>>>>>> commit, with some compensating additions of preemption disabling.
> >>>>>>>>> This preemption disabling uses guard(preempt_notrace)().
> >>>>>>>>> 
> >>>>>>>>> However, Yonghong Song points out that BPF assumes that non-sleepable
> >>>>>>>>> BPF programs will remain on the same CPU, which means that migration
> >>>>>>>>> must be disabled whenever preemption remains enabled.  In addition,
> >>>>>>>>> non-RT kernels have performance expectations that would be violated by
> >>>>>>>>> allowing the BPF programs to be preempted.
> >>>>>>>>> 
> >>>>>>>>> Therefore, continue to disable preemption in non-RT kernels, and protect
> >>>>>>>>> the BPF program with both SRCU and migration disabling for RT kernels,
> >>>>>>>>> and even then only if preemption is not already disabled.
> >>>>>>>> 
> >>>>>>>> Hi Paul,
> >>>>>>>> 
> >>>>>>>> Is there a reason to not make non-RT also benefit from SRCU fast and trace points for BPF? Can be a follow up patch though if needed.
> >>>>>>> 
> >>>>>>> Because in some cases the non-RT benefit is suspected to be negative
> >>>>>>> due to increasing the probability of preemption in awkward places.
> >>>>>> 
> >>>>>> Since you mentioned suspected, I am guessing there is no concrete data collected
> >>>>>> to substantiate that specifically for BPF programs, but correct me if I missed
> >>>>>> something. Assuming you're referring to latency versus tradeoffs issues, due to
> >>>>>> preemption, Android is not PREEMPT_RT but is expected to be low latency in
> >>>>>> general as well. So is this decision the right one for Android as well,
> >>>>>> considering that (I heard) it uses BPF? Just an open-ended question.
> >>>>>> 
> >>>>>> There is also issue of 2 different paths for PREEMPT_RT versus otherwise,
> >>>>>> complicating the tracing side so there better be a reason for that I guess.
> >>>>> 
> >>>>> You are advocating a change in behavior for non-RT workloads.  Why do
> >>>>> you believe that this change would be OK for those workloads?
> >>>> 
> >>>> Same reasons I provided in my last email. If we are saying SRCU-fast is required for lower latency, I find it strange that we are leaving out Android which has low latency audio usecases, for instance.
> >>> 
> >>> If Android provides numbers showing that it helps them, then it is easy
> >>> to provide a Kconfig option that defaults to PREEMPT_RT, but that Android
> >>> can override.  Right?
> >> 
> >> Sure, but my suspicion is Android or others are not going to look into every PREEMPT_RT specific optimization (not just this one) and see if it benefits their interactivity usecases. They will simply miss out on it without knowing they are.
> >> 
> >> It might be a good idea (for me) to explore how many such optimizations exist though, that we take for granted. I will look into exploring this on my side. :)
> > 
> > One workload's optimization is another workload's pessimization, in
> > part because there are a lot of different measures of performance that
> > different workloads care about..
> > 
> > But as a practical matter, this is Steven's decision.
> > 
> > Though if he does change the behavior on non-RT setups, I would thank
> > him to remove my name from the commit, or at least record in the commit
> > log that I object to changing other workloads' behaviors.
> 
> You have a point. I am not saying we should do this for sure but should at least consider / explore it.

Now *that* I have no problem with, as long as the consideration and
exploration is very public and includes the usual BPF/tracing suspects.

							Thanx, Paul

> Thanks.
> 
> 
> 
> > 
> >                            Thanx, Paul
> > 
> >> thanks,
> >> 
> >> - Joel
> >> 
> >>> 
> >>>                           Thanx, Paul
> >>> 
> >>>> Thanks,
> >>>> 
> >>>> - Joel
> >>>> 
> >>>> 
> >>>>> 
> >>>>>                          Thanx, Paul