linux-kernel - Re: [PATCH 17/19] srcu: Optimize SRCU-fast-updown for arm64

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8a33bf08-8ca4-4fc1-9481-fff2247e5518@paulmck-laptop>
Date: Mon, 3 Nov 2025 11:17:50 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc: rcu@...r.kernel.org, linux-kernel@...r.kernel.org, kernel-team@...a.com,
	rostedt@...dmis.org, Catalin Marinas <catalin.marinas@....com>,
	Will Deacon <will@...nel.org>, Mark Rutland <mark.rutland@....com>,
	Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
	linux-arm-kernel@...ts.infradead.org, bpf@...r.kernel.org
Subject: Re: [PATCH 17/19] srcu: Optimize SRCU-fast-updown for arm64

On Mon, Nov 03, 2025 at 01:16:23PM -0500, Mathieu Desnoyers wrote:
> On 2025-11-03 12:08, Paul E. McKenney wrote:
> > On Mon, Nov 03, 2025 at 08:34:10AM -0500, Mathieu Desnoyers wrote:
> [...]
> 
> > > One example is the libside (user level) rcu implementation which uses
> > > two counters per cpu [1]. One counter is the rseq fast path, and the
> > > second counter is for atomics (as fallback).
> > > 
> > > If the typical scenario we want to optimize for is thread context, we
> > > can probably remove the atomic from the fast path with just preempt off
> > > by partitioning the per-cpu counters further, one possibility being:
> > > 
> > > struct percpu_srcu_fast_pair {
> > > 	unsigned long lock, unlock;
> > > };
> > > 
> > > struct percpu_srcu_fast {
> > > 	struct percpu_srcu_fast_pair thread;
> > > 	struct percpu_srcu_fast_pair irq;
> > > };
> > > 
> > > And the grace period sums both thread and irq counters.
> > > 
> > > Thoughts ?
> > 
> > One complication here is that we need srcu_down_read() at task level
> > and the matching srcu_up_read() at softirq and/or hardirq level.
> > 
> > Or am I missing a trick in your proposed implementation?
> 
> I think you are indeed missing the crux of the solution here.
> 
> Each of task level and soft/hard irq level increments will be
> dispatched into different counters (thread vs irq). But the
> grace period will sum, for each the the two periods one after the
> next, the unlock counts and then the lock counts. It will consider
> the period as quiescent if the delta between the two sums is zero,
> e.g.
> 
>   (count[period].irq.unlock + count[period].thread.unlock -
>    count[period].irq.lock - count[period].thread.lock) == 0
> 
> so the sum does not care how the counters were incremented
> (it just does a load-relaxed), but each counter category
> have its own way of dealing with concurrency (thread: percpu
> ops, irq: atomics).
> 
> This is effectively a use of split-counters, but the split
> is across concurrency handling mechanisms rather than across
> CPUs.

Ah, got it, thank you!  But we would need an additional softirq counter,
correct?

I will keep this in my back pocket in case Catalin's and Yicong's prefetch
trick turns out to be problematic, and again, thank you!

							Thanx, Paul