[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZvY2zBiluLkqRSkc@boqun-archlinux>
Date: Thu, 26 Sep 2024 21:38:36 -0700
From: Boqun Feng <boqun.feng@...il.com>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
Jonas Oberhauser <jonas.oberhauser@...weicloud.com>,
linux-kernel@...r.kernel.org, rcu@...r.kernel.org,
linux-mm@...ck.org, lkmm@...ts.linux.dev,
"Paul E. McKenney" <paulmck@...nel.org>,
Frederic Weisbecker <frederic@...nel.org>,
Neeraj Upadhyay <neeraj.upadhyay@...nel.org>,
Joel Fernandes <joel@...lfernandes.org>,
Josh Triplett <josh@...htriplett.org>,
Uladzislau Rezki <urezki@...il.com>,
Steven Rostedt <rostedt@...dmis.org>,
Lai Jiangshan <jiangshanlai@...il.com>,
Zqiang <qiang.zhang1211@...il.com>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>, Will Deacon <will@...nel.org>,
Waiman Long <longman@...hat.com>,
Mark Rutland <mark.rutland@....com>,
Thomas Gleixner <tglx@...utronix.de>,
Kent Overstreet <kent.overstreet@...il.com>,
Vlastimil Babka <vbabka@...e.cz>, maged.michael@...il.com,
Neeraj Upadhyay <neeraj.upadhyay@....com>
Subject: Re: [RFC PATCH 1/4] hazptr: Add initial implementation of hazard
pointers
On Fri, Sep 27, 2024 at 03:20:40AM +0200, Mathieu Desnoyers wrote:
> On 2024-09-26 18:12, Linus Torvalds wrote:
> > On Thu, 26 Sept 2024 at 08:54, Jonas Oberhauser
> > <jonas.oberhauser@...weicloud.com> wrote:
> > >
> > > No, the issue introduced by the compiler optimization (or by your
> > > original patch) is that the CPU can speculatively load from the first
> > > pointer as soon as it has completed the load of that pointer:
> >
> > You mean the compiler can do it. The inline asm has no impact on what
> > the CPU does. The conditional isn't a barrier for the actual hardware.
> > But once the compiler doesn't try to do it, the data dependency on the
> > address does end up being an ordering constraint on the hardware too
> > (I'm happy to say that I haven't heard from the crazies that want
> > value prediction in a long time).
> >
> > Just use a barrier. Or make sure to use the proper ordered memory
> > accesses when possible. Don't use an inline asm for the compare - we
> > don't even have anything insane like that as a portable helper, and we
> > shouldn't have it.
>
> How does the compiler barrier help in any way here ?
>
> I am concerned about the compiler SSA GVN (Global Value Numbering)
> optimizations, and I don't think a compiler barrier solves anything.
> (or I'm missing something obvious)
I think you're right, a compiler barrier doesn't help here:
head = READ_ONCE(p);
smp_mb();
WRITE_ONCE(*slot, head);
ptr = READ_ONCE(p);
if (ptr != head) {
...
} else {
barrier();
return ptr;
}
compilers can replace 'ptr' with 'head' because of the equality, and
even putting barrier() here cannot prevent compiler to rewrite the else
branch into:
else {
barrier();
return head;
}
because that's just using a different register, unrelated to memory
accesses.
Jonas, am I missing something subtle? Or this is different than what you
proposed?
Regards,
Boqun
>
> I was concerned about the suggestion from Jonas to use "node2"
> rather than "node" after the equality check as a way to ensure
> the intended register is used to return the pointer, because after
> the SSA GVN optimization pass, AFAIU this won't help in any way.
> I have a set of examples below that show gcc use the result of the
> first load, and clang use the result of the second load (on
> both x86-64 and aarch64). Likewise when a load-acquire is used as
> second load, which I find odd. Hopefully mixing this optimization
> from gcc with speculation still abide by the memory model.
>
> Only the asm goto approach ensures that gcc uses the result from
> the second load.
>
[...]
Powered by blists - more mailing lists