linux-kernel - Re: [RFC PATCH v4 3/4] hazptr: Implement Hazard Pointers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fcbc99d0-b7c7-45f7-9cf8-f5d3bfad966e@efficios.com>
Date: Thu, 18 Dec 2025 12:35:18 -0500
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Boqun Feng <boqun.feng@...il.com>
Cc: Joel Fernandes <joel@...lfernandes.org>,
 "Paul E. McKenney" <paulmck@...nel.org>, linux-kernel@...r.kernel.org,
 Nicholas Piggin <npiggin@...il.com>, Michael Ellerman <mpe@...erman.id.au>,
 Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
 Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
 Will Deacon <will@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
 Alan Stern <stern@...land.harvard.edu>, John Stultz <jstultz@...gle.com>,
 Neeraj Upadhyay <Neeraj.Upadhyay@....com>,
 Linus Torvalds <torvalds@...ux-foundation.org>,
 Andrew Morton <akpm@...ux-foundation.org>,
 Frederic Weisbecker <frederic@...nel.org>,
 Josh Triplett <josh@...htriplett.org>, Uladzislau Rezki <urezki@...il.com>,
 Steven Rostedt <rostedt@...dmis.org>, Lai Jiangshan
 <jiangshanlai@...il.com>, Zqiang <qiang.zhang1211@...il.com>,
 Ingo Molnar <mingo@...hat.com>, Waiman Long <longman@...hat.com>,
 Mark Rutland <mark.rutland@....com>, Thomas Gleixner <tglx@...utronix.de>,
 Vlastimil Babka <vbabka@...e.cz>, maged.michael@...il.com,
 Mateusz Guzik <mjguzik@...il.com>,
 Jonas Oberhauser <jonas.oberhauser@...weicloud.com>, rcu@...r.kernel.org,
 linux-mm@...ck.org, lkmm@...ts.linux.dev
Subject: Re: [RFC PATCH v4 3/4] hazptr: Implement Hazard Pointers

On 2025-12-18 03:36, Boqun Feng wrote:
> On Wed, Dec 17, 2025 at 08:45:30PM -0500, Mathieu Desnoyers wrote:
[...]
>> +static inline
>> +void *hazptr_acquire(struct hazptr_ctx *ctx, void * const * addr_p)
>> +{
>> +	struct hazptr_slot *slot = NULL;
>> +	void *addr, *addr2;
>> +
>> +	/*
>> +	 * Load @addr_p to know which address should be protected.
>> +	 */
>> +	addr = READ_ONCE(*addr_p);
>> +	for (;;) {
>> +		if (!addr)
>> +			return NULL;
>> +		guard(preempt)();
>> +		if (likely(!hazptr_slot_is_backup(ctx, slot))) {
>> +			slot = hazptr_get_free_percpu_slot();
> 
> I need to continue share my concerns about this "allocating slot while
> protecting" pattern. Here realistically, we will go over a few of the
> per-CPU hazard pointer slots *every time* instead of directly using a
> pre-allocated hazard pointer slot.

No, that's not the expected fast-path with CONFIG_PREEMPT_HAZPTR=y
(introduced in patch 4/4).

With PREEMPT_HAZPTR, using more than one hazard pointer per CPU will
only happen if there are nested hazard pointer users, which can happen
due to:

- Holding a hazard pointer across function calls, where another hazard
   pointer is used.
- Using hazard pointers from interrupt handlers (note: my current code
   only does preempt disable, not irq disable, this is something I'd need
   to change if we wish to acquire/release hazard pointers from interrupt
   handlers). But even that should be a rare event.

So the fast-path has an initial state where there are no hazard pointers
in use on the CPU, which means hazptr_acquire() finds its empty slot at
index 0.

> Could you utilize this[1] to see a
> comparison of the reader-side performance against RCU/SRCU?

Good point ! Let's see.

On a AMD 2x EPYC 9654 96-Core Processor with 192 cores,
hyperthreading disabled,
CONFIG_PREEMPT=y,
CONFIG_PREEMPT_RCU=y,
CONFIG_PREEMPT_HAZPTR=y.

scale_type                 ns
-----------------------
hazptr-smp-mb             13.1   <- this implementation
hazptr-barrier            11.5   <- replace smp_mb() on acquire with barrier(), requires IPIs on synchronize.
hazptr-smp-mb-hlist       12.7   <- replace per-task hp context and per-cpu overflow lists by hlist.
rcu                       17.0
srcu                      20.0
srcu-fast                  1.5
rcu-tasks                  0.0
rcu-trace                  1.7
refcnt                  1148.0
rwlock                  1190.0
rwsem                   4199.3
lock                   41070.6
lock-irq               46176.3
acqrel                     1.1

So only srcu-fast, rcu-tasks, rcu-trace and a plain acqrel
appear to beat hazptr read-side performance.

[...]

>> +/*
>> + * Perform piecewise iteration on overflow list waiting until "addr" is
>> + * not present. Raw spinlock is released and taken between each list
>> + * item and busy loop iteration. The overflow list generation is checked
>> + * each time the lock is taken to validate that the list has not changed
>> + * before resuming iteration or busy wait. If the generation has
>> + * changed, retry the entire list traversal.
>> + */
>> +static
>> +void hazptr_synchronize_overflow_list(struct overflow_list *overflow_list, void *addr)
>> +{
>> +	struct hazptr_backup_slot *backup_slot;
>> +	uint64_t snapshot_gen;
>> +
>> +	raw_spin_lock(&overflow_list->lock);
>> +retry:
>> +	snapshot_gen = overflow_list->gen;
>> +	list_for_each_entry(backup_slot, &overflow_list->head, node) {
>> +		/* Busy-wait if node is found. */
>> +		while (smp_load_acquire(&backup_slot->slot.addr) == addr) { /* Load B */
>> +			raw_spin_unlock(&overflow_list->lock);
>> +			cpu_relax();
> 
> I think we should prioritize the scan thread solution [2] instead of
> busy waiting hazrd pointer updaters, because when we have multiple
> hazard pointer usages we would want to consolidate the scans from
> updater side.

I agree that batching scans with a worker thread is a logical next step.

> If so, the whole ->gen can be avoided.

How would it allow removing the generation trick without causing long
raw spinlock latencies ?

> 
> However this ->gen idea does seem ot resolve another issue for me, I'm
> trying to make shazptr critical section preemptive by using a per-task
> backup slot (if you recall, this is your idea from the hallway
> discussions we had during LPC 2024),

I honestly did not remember. It's been a whole year! ;-)

> and currently I could not make it
> work because the following sequeue:
> 
> 1. CPU 0 already has one pointer protected.
> 
> 2. CPU 1 begins the updater scan, and it scans the list of preempted
>     hazard pointer readers, no reader.
> 
> 3. CPU 0 does a context switch, it stores the current hazard pointer
>     value to the current task's ->hazard_slot (let's say the task is task
>     A), and add it to the list of preempted hazard pointer readers.
> 
> 4. CPU 0 clears its percpu hazptr_slots for the next task (B).
> 
> 5. CPU 1 continues the updater scan, and it scans the percpu slot of
>     CPU 0, and finds no reader.
> 
> in this situation, updater will miss a reader. But if we add a
> generation snapshotting at step 2 and generation increment at step 3, I
> think it'll work.
> 
> IMO, if we make this work, it's better than the current backup slot
> mechanism IMO, because we only need to acquire the lock if context
> switch happens.

With PREEMPT_HAZPTR we also only need to acquire the per-cpu overflow
list raw spinlock on context switch (preemption or blocking). The only
other case requiring it is hazptr nested usage (more than 8 active
hazptr) on a thread context + nested irqs.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com