[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aURihbmjsKi8m7MO@tardis-2.local>
Date: Fri, 19 Dec 2025 05:22:29 +0900
From: Boqun Feng <boqun.feng@...il.com>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc: Joel Fernandes <joel@...lfernandes.org>,
"Paul E. McKenney" <paulmck@...nel.org>, linux-kernel@...r.kernel.org,
Nicholas Piggin <npiggin@...il.com>,
Michael Ellerman <mpe@...erman.id.au>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
Will Deacon <will@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
Alan Stern <stern@...land.harvard.edu>, John Stultz <jstultz@...gle.com>,
Neeraj Upadhyay <Neeraj.Upadhyay@....com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Frederic Weisbecker <frederic@...nel.org>,
Josh Triplett <josh@...htriplett.org>,
Uladzislau Rezki <urezki@...il.com>,
Steven Rostedt <rostedt@...dmis.org>,
Lai Jiangshan <jiangshanlai@...il.com>,
Zqiang <qiang.zhang1211@...il.com>, Ingo Molnar <mingo@...hat.com>,
Waiman Long <longman@...hat.com>, Mark Rutland <mark.rutland@....com>,
Thomas Gleixner <tglx@...utronix.de>, Vlastimil Babka <vbabka@...e.cz>,
maged.michael@...il.com, Mateusz Guzik <mjguzik@...il.com>,
Jonas Oberhauser <jonas.oberhauser@...weicloud.com>, rcu@...r.kernel.org,
linux-mm@...ck.org, lkmm@...ts.linux.dev
Subject: Re: [RFC PATCH v4 3/4] hazptr: Implement Hazard Pointers
On Thu, Dec 18, 2025 at 12:35:18PM -0500, Mathieu Desnoyers wrote:
> On 2025-12-18 03:36, Boqun Feng wrote:
> > On Wed, Dec 17, 2025 at 08:45:30PM -0500, Mathieu Desnoyers wrote:
> [...]
> > > +static inline
> > > +void *hazptr_acquire(struct hazptr_ctx *ctx, void * const * addr_p)
> > > +{
> > > + struct hazptr_slot *slot = NULL;
> > > + void *addr, *addr2;
> > > +
> > > + /*
> > > + * Load @addr_p to know which address should be protected.
> > > + */
> > > + addr = READ_ONCE(*addr_p);
> > > + for (;;) {
> > > + if (!addr)
> > > + return NULL;
> > > + guard(preempt)();
> > > + if (likely(!hazptr_slot_is_backup(ctx, slot))) {
> > > + slot = hazptr_get_free_percpu_slot();
> >
> > I need to continue share my concerns about this "allocating slot while
> > protecting" pattern. Here realistically, we will go over a few of the
> > per-CPU hazard pointer slots *every time* instead of directly using a
> > pre-allocated hazard pointer slot.
>
> No, that's not the expected fast-path with CONFIG_PREEMPT_HAZPTR=y
> (introduced in patch 4/4).
>
I see, I was missing the patch #4, will take a look and reply
accordingly.
> With PREEMPT_HAZPTR, using more than one hazard pointer per CPU will
> only happen if there are nested hazard pointer users, which can happen
> due to:
>
> - Holding a hazard pointer across function calls, where another hazard
> pointer is used.
> - Using hazard pointers from interrupt handlers (note: my current code
> only does preempt disable, not irq disable, this is something I'd need
> to change if we wish to acquire/release hazard pointers from interrupt
> handlers). But even that should be a rare event.
>
> So the fast-path has an initial state where there are no hazard pointers
> in use on the CPU, which means hazptr_acquire() finds its empty slot at
> index 0.
>
> > Could you utilize this[1] to see a
> > comparison of the reader-side performance against RCU/SRCU?
>
> Good point ! Let's see.
>
> On a AMD 2x EPYC 9654 96-Core Processor with 192 cores,
> hyperthreading disabled,
> CONFIG_PREEMPT=y,
> CONFIG_PREEMPT_RCU=y,
> CONFIG_PREEMPT_HAZPTR=y.
>
> scale_type ns
> -----------------------
> hazptr-smp-mb 13.1 <- this implementation
> hazptr-barrier 11.5 <- replace smp_mb() on acquire with barrier(), requires IPIs on synchronize.
> hazptr-smp-mb-hlist 12.7 <- replace per-task hp context and per-cpu overflow lists by hlist.
> rcu 17.0
> srcu 20.0
> srcu-fast 1.5
> rcu-tasks 0.0
> rcu-trace 1.7
> refcnt 1148.0
> rwlock 1190.0
> rwsem 4199.3
> lock 41070.6
> lock-irq 46176.3
> acqrel 1.1
>
> So only srcu-fast, rcu-tasks, rcu-trace and a plain acqrel
> appear to beat hazptr read-side performance.
>
Could you also see the reader-side performance impact when the percpu
hazard pointer slots are used up? I.e. the worst case.
> [...]
>
> > > +/*
> > > + * Perform piecewise iteration on overflow list waiting until "addr" is
> > > + * not present. Raw spinlock is released and taken between each list
> > > + * item and busy loop iteration. The overflow list generation is checked
> > > + * each time the lock is taken to validate that the list has not changed
> > > + * before resuming iteration or busy wait. If the generation has
> > > + * changed, retry the entire list traversal.
> > > + */
> > > +static
> > > +void hazptr_synchronize_overflow_list(struct overflow_list *overflow_list, void *addr)
> > > +{
> > > + struct hazptr_backup_slot *backup_slot;
> > > + uint64_t snapshot_gen;
> > > +
> > > + raw_spin_lock(&overflow_list->lock);
> > > +retry:
> > > + snapshot_gen = overflow_list->gen;
> > > + list_for_each_entry(backup_slot, &overflow_list->head, node) {
> > > + /* Busy-wait if node is found. */
> > > + while (smp_load_acquire(&backup_slot->slot.addr) == addr) { /* Load B */
> > > + raw_spin_unlock(&overflow_list->lock);
> > > + cpu_relax();
> >
> > I think we should prioritize the scan thread solution [2] instead of
> > busy waiting hazrd pointer updaters, because when we have multiple
> > hazard pointer usages we would want to consolidate the scans from
> > updater side.
>
> I agree that batching scans with a worker thread is a logical next step.
>
> > If so, the whole ->gen can be avoided.
>
> How would it allow removing the generation trick without causing long
> raw spinlock latencies ?
>
Because we won't need to busy-wait for the readers to go away, we can
check whether they are still there in the next scan.
so:
list_for_each_entry(backup_slot, &overflow_list->head, node) {
/* Busy-wait if node is found. */
if (smp_load_acquire(&backup_slot->slot.addr) == addr) { /* Load B */
<mark addr as unable to free and move on>
> >
> > However this ->gen idea does seem ot resolve another issue for me, I'm
> > trying to make shazptr critical section preemptive by using a per-task
> > backup slot (if you recall, this is your idea from the hallway
> > discussions we had during LPC 2024),
>
> I honestly did not remember. It's been a whole year! ;-)
>
> > and currently I could not make it
> > work because the following sequeue:
> >
> > 1. CPU 0 already has one pointer protected.
> >
> > 2. CPU 1 begins the updater scan, and it scans the list of preempted
> > hazard pointer readers, no reader.
> >
> > 3. CPU 0 does a context switch, it stores the current hazard pointer
> > value to the current task's ->hazard_slot (let's say the task is task
> > A), and add it to the list of preempted hazard pointer readers.
> >
> > 4. CPU 0 clears its percpu hazptr_slots for the next task (B).
> >
> > 5. CPU 1 continues the updater scan, and it scans the percpu slot of
> > CPU 0, and finds no reader.
> >
> > in this situation, updater will miss a reader. But if we add a
> > generation snapshotting at step 2 and generation increment at step 3, I
> > think it'll work.
> >
> > IMO, if we make this work, it's better than the current backup slot
> > mechanism IMO, because we only need to acquire the lock if context
> > switch happens.
>
> With PREEMPT_HAZPTR we also only need to acquire the per-cpu overflow
> list raw spinlock on context switch (preemption or blocking). The only
Indeed, pre-allocating the slot on the stack to save the percpu slot
when context switch seems easier and quite smart ;-) Let me take a look.
Regards,
Boqun
> other case requiring it is hazptr nested usage (more than 8 active
> hazptr) on a thread context + nested irqs.
>
> Thanks,
>
> Mathieu
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com
Powered by blists - more mailing lists