linux-kernel - Re: [RFC PATCH v4 3/4] hazptr: Implement Hazard Pointers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6c96dbb5-bffc-423f-bb6a-3072abb5f711@efficios.com>
Date: Fri, 19 Dec 2025 09:22:19 -0500
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Boqun Feng <boqun.feng@...il.com>
Cc: Joel Fernandes <joel@...lfernandes.org>,
 "Paul E. McKenney" <paulmck@...nel.org>, linux-kernel@...r.kernel.org,
 Nicholas Piggin <npiggin@...il.com>, Michael Ellerman <mpe@...erman.id.au>,
 Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
 Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
 Will Deacon <will@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
 Alan Stern <stern@...land.harvard.edu>, John Stultz <jstultz@...gle.com>,
 Neeraj Upadhyay <Neeraj.Upadhyay@....com>,
 Linus Torvalds <torvalds@...ux-foundation.org>,
 Andrew Morton <akpm@...ux-foundation.org>,
 Frederic Weisbecker <frederic@...nel.org>,
 Josh Triplett <josh@...htriplett.org>, Uladzislau Rezki <urezki@...il.com>,
 Steven Rostedt <rostedt@...dmis.org>, Lai Jiangshan
 <jiangshanlai@...il.com>, Zqiang <qiang.zhang1211@...il.com>,
 Ingo Molnar <mingo@...hat.com>, Waiman Long <longman@...hat.com>,
 Mark Rutland <mark.rutland@....com>, Thomas Gleixner <tglx@...utronix.de>,
 Vlastimil Babka <vbabka@...e.cz>, maged.michael@...il.com,
 Mateusz Guzik <mjguzik@...il.com>,
 Jonas Oberhauser <jonas.oberhauser@...weicloud.com>, rcu@...r.kernel.org,
 linux-mm@...ck.org, lkmm@...ts.linux.dev
Subject: Re: [RFC PATCH v4 3/4] hazptr: Implement Hazard Pointers

On 2025-12-18 19:43, Boqun Feng wrote:
> On Thu, Dec 18, 2025 at 12:35:18PM -0500, Mathieu Desnoyers wrote:
> [...]
>>> Could you utilize this[1] to see a
>>> comparison of the reader-side performance against RCU/SRCU?
>>
>> Good point ! Let's see.
>>
>> On a AMD 2x EPYC 9654 96-Core Processor with 192 cores,
>> hyperthreading disabled,
>> CONFIG_PREEMPT=y,
>> CONFIG_PREEMPT_RCU=y,
>> CONFIG_PREEMPT_HAZPTR=y.
>>
>> scale_type                 ns
>> -----------------------
>> hazptr-smp-mb             13.1   <- this implementation
>> hazptr-barrier            11.5   <- replace smp_mb() on acquire with barrier(), requires IPIs on synchronize.
>> hazptr-smp-mb-hlist       12.7   <- replace per-task hp context and per-cpu overflow lists by hlist.
>> rcu                       17.0
> 
> Hmm.. now looking back, how is it possible that hazptr is faster than
> RCU on the reader-side? Because a grace period was happening and
> triggered rcu_read_unlock_special()? This is actualy more interesting.
So I could be entirely misreading the code, but, we have:

rcu_flavor_sched_clock_irq():
[...]
         /* If GP is oldish, ask for help from rcu_read_unlock_special(). */
         if (rcu_preempt_depth() > 0 &&
             __this_cpu_read(rcu_data.core_needs_qs) &&
             __this_cpu_read(rcu_data.cpu_no_qs.b.norm) &&
             !t->rcu_read_unlock_special.b.need_qs &&
             time_after(jiffies, rcu_state.gp_start + HZ))
                 t->rcu_read_unlock_special.b.need_qs = true;

which means we set need_qs = true as a result from observing
cpu_no_qs.b.norm == true.

This is sufficient to trigger calls (plural) to rcu_read_unlock_special()
from __rcu_read_unlock.

But then if we look at rcu_preempt_deferred_qs_irqrestore()
which we would expect to clear the rcu_read_unlock_special.b.need_qs
state, we have this:

         special = t->rcu_read_unlock_special;
         if (!special.s && !rdp->cpu_no_qs.b.exp) {
                 local_irq_restore(flags);
                 return;
         }
         t->rcu_read_unlock_special.s = 0;

which skips over clearing the state unless there is an expedited
grace period required.

So unless I'm missing something, we should _also_ clear that state
when it's invoked after rcu_flavor_sched_clock_irq, so the next
__rcu_read_unlock won't all call into rcu_read_unlock_special().

I'm adding a big warning about sleep deprivation and possibly
misunderstanding the whole thing. What am I missing ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com