linux-kernel - Re: [RFC PATCH v4 0/4] Hazard Pointers

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <54E94704-92C0-4F10-AED9-46DFCA61233B@joelfernandes.org>
Date: Thu, 18 Dec 2025 05:33:02 -0500
From: Joel Fernandes <joel@...lfernandes.org>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc: Boqun Feng <boqun.feng@...il.com>,
 "Paul E. McKenney" <paulmck@...nel.org>, linux-kernel@...r.kernel.org,
 Nicholas Piggin <npiggin@...il.com>, Michael Ellerman <mpe@...erman.id.au>,
 Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
 Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
 Will Deacon <will@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
 Alan Stern <stern@...land.harvard.edu>, John Stultz <jstultz@...gle.com>,
 Neeraj Upadhyay <Neeraj.Upadhyay@....com>,
 Linus Torvalds <torvalds@...ux-foundation.org>,
 Andrew Morton <akpm@...ux-foundation.org>,
 Frederic Weisbecker <frederic@...nel.org>,
 Josh Triplett <josh@...htriplett.org>, Uladzislau Rezki <urezki@...il.com>,
 Steven Rostedt <rostedt@...dmis.org>,
 Lai Jiangshan <jiangshanlai@...il.com>, Zqiang <qiang.zhang1211@...il.com>,
 Ingo Molnar <mingo@...hat.com>, Waiman Long <longman@...hat.com>,
 Mark Rutland <mark.rutland@....com>, Thomas Gleixner <tglx@...utronix.de>,
 Vlastimil Babka <vbabka@...e.cz>, maged.michael@...il.com,
 Mateusz Guzik <mjguzik@...il.com>,
 Jonas Oberhauser <jonas.oberhauser@...weicloud.com>, rcu@...r.kernel.org,
 linux-mm@...ck.org, lkmm@...ts.linux.dev,
 Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Subject: Re: [RFC PATCH v4 0/4] Hazard Pointers

Hi Mathieu,
Thanks for posting this.

> On Dec 17, 2025, at 8:45 PM, Mathieu Desnoyers <mathieu.desnoyers@...icios.com> wrote:
> 
> Hi,
> 
> Here is a revisited version of my Hazard Pointers series. Boqun, Joel,
> if you guys have time to try it out with your use-cases it would be
> great!
> 
> This new version does the following:
> 
> - It has 8 preallocated hazard pointer slots per CPU (one cache line),
> - The hazard pointer user allocates a hazard pointer context variable
>  (typically on the stack), which contains the pointer to the slot *and*
>  a backup slot,
> - When all the per-CPU slots are in use, fallback to the backup slot.
>  Chain the backup slot into per-CPU lists, each protected by a raw
>  spinlock.
> - The hazard pointer synchronize does a piecewise iteration on the
>  per-CPU overflow slots lists, releasing the raw spinlock between
>  each list item. It uses a 64-bit generation counter to check for
>  concurrent list changes, and restart the traversal on generation
>  counter mismatch.
> - There is a new CONFIG_PREEMPT_HAZPTR config option. When enabled,
>  the hazard pointer acquire/release adds and then removes the hazard
>  pointer context from a per-task linked list. On context switch, the
>  scheduler migrates the per-CPU slots used by the task to the backup
>  per-context slots, thus making sure the per-CPU slots are not used
>  by preempted and blocked tasks.

This last point is another reason why I want the slots to be per task instead of per CPU. It becomes very natural because the hazard pointer is always associated with a task only anyway, not with the CPU (at usecase level). By putting the slot in the task struct, we allow these requirements to flow naturally without requiring any locking or list management.. Did I miss something about the use cases?

I did some measurements about the task-scanning issue, and it is fast in my testing (~1ms/10000 tasks). Any input from you or anyone on what the typical task count distribution is that we are addressing? I also made a rough prototype, and it appears to be simpler with fewer lines of code because I do not need to handle preemption. It just happens naturally.

First of all, we can have a per-task counter that tracks how many hazard pointers are active. If this is zero, then we can simply skip the task instead of wasting cycles scanning all the task slot. Further, we can have a retire list that reuses a single scan to scan all the objects in the retire list, thus reusing the scan cost. This can also assist in asynchronously implementing object retiring via a dedicated thread perhaps with the tasks RCU infrastructure. We can also make this per-task counter a bitmap to speed up scanning potentially.

I am okay with the concept of an overflow list, but if we keep the overflow list at the per-task level instead of the per-CPU level, it is highly unlikely IMO that such an overflow list will be used unless more than, say, eight hazard pointers per task are active at any given time. So its lock contention would be rarer than, say, having a per-CPU overflow list. I would say that contention would be incredibly rare because typically hazard pointers are used by multiple tasks, each of which will have its own unique set of slots. Whereas in a per-CPU overflow  approach, we have a higher chance of lock contention, especially when the number of CPUs is low.

Other than the task-scanning performance issue, what am I missing?

Another nice benefit of using per-task hazard pointers is that we can also implement sleeping in hazard pointer sections because we will be scanning for sleeping tasks as well.

By contrast, the other approaches I have seen with per-CPU hazard pointers forbid sleeping, since after sleeping a task is no longer associated with its CPU. The other approaches also have a higher likelihood of locking Due to running out of slots.

Of course I am missing a use case, but I suspect we can find a per-CPU ref-count use case that benefits from this. I am researching use cases when I get time. I think my next task is to find a solid use case for this before doing further development of a solution..

By the way, feedback on the scanning patch:

Can you consider using a per-CPU counter to track the number of active slots per CPU? That way you can ignore CPU slots for CPUs that are not using hazard pointers. Another idea is to skip idle CPUs as well.

Have you also considered any asynchronous use case where maintaining a retired list would assist in RCU-style deferred reclaim of hazard-pointer objects?

thanks,

 - Joel 

> 
> It is based on v6.18.1.
> 
> Review is very welcome,
> 
> Thanks,
> 
> Mathieu
> 
> Cc: Nicholas Piggin <npiggin@...il.com>
> Cc: Michael Ellerman <mpe@...erman.id.au>
> Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
> Cc: Sebastian Andrzej Siewior <bigeasy@...utronix.de>
> Cc: "Paul E. McKenney" <paulmck@...nel.org>
> Cc: Will Deacon <will@...nel.org>
> Cc: Peter Zijlstra <peterz@...radead.org>
> Cc: Boqun Feng <boqun.feng@...il.com>
> Cc: Alan Stern <stern@...land.harvard.edu>
> Cc: John Stultz <jstultz@...gle.com>
> Cc: Neeraj Upadhyay <Neeraj.Upadhyay@....com>
> Cc: Linus Torvalds <torvalds@...ux-foundation.org>
> Cc: Andrew Morton <akpm@...ux-foundation.org>
> Cc: Boqun Feng <boqun.feng@...il.com>
> Cc: Frederic Weisbecker <frederic@...nel.org>
> Cc: Joel Fernandes <joel@...lfernandes.org>
> Cc: Josh Triplett <josh@...htriplett.org>
> Cc: Uladzislau Rezki <urezki@...il.com>
> Cc: Steven Rostedt <rostedt@...dmis.org>
> Cc: Lai Jiangshan <jiangshanlai@...il.com>
> Cc: Zqiang <qiang.zhang1211@...il.com>
> Cc: Ingo Molnar <mingo@...hat.com>
> Cc: Waiman Long <longman@...hat.com>
> Cc: Mark Rutland <mark.rutland@....com>
> Cc: Thomas Gleixner <tglx@...utronix.de>
> Cc: Vlastimil Babka <vbabka@...e.cz>
> Cc: maged.michael@...il.com
> Cc: Mateusz Guzik <mjguzik@...il.com>
> Cc: Jonas Oberhauser <jonas.oberhauser@...weicloud.com>
> Cc: rcu@...r.kernel.org
> Cc: linux-mm@...ck.org
> Cc: lkmm@...ts.linux.dev
> 
> Mathieu Desnoyers (4):
>  compiler.h: Introduce ptr_eq() to preserve address dependency
>  Documentation: RCU: Refer to ptr_eq()
>  hazptr: Implement Hazard Pointers
>  hazptr: Migrate per-CPU slots to backup slot on context switch
> 
> Documentation/RCU/rcu_dereference.rst |  38 +++-
> include/linux/compiler.h              |  63 +++++++
> include/linux/hazptr.h                | 241 ++++++++++++++++++++++++++
> include/linux/sched.h                 |   4 +
> init/init_task.c                      |   3 +
> init/main.c                           |   2 +
> kernel/Kconfig.preempt                |  10 ++
> kernel/Makefile                       |   2 +-
> kernel/fork.c                         |   3 +
> kernel/hazptr.c                       | 150 ++++++++++++++++
> kernel/sched/core.c                   |   2 +
> 11 files changed, 512 insertions(+), 6 deletions(-)
> create mode 100644 include/linux/hazptr.h
> create mode 100644 kernel/hazptr.c
> 
> --
> 2.39.5
>