[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <54487a36-f74c-46c3-aed7-fc86eaaa9ca2@huaweicloud.com>
Date: Thu, 26 Sep 2024 18:40:30 +0200
From: Jonas Oberhauser <jonas.oberhauser@...weicloud.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Boqun Feng <boqun.feng@...il.com>, linux-kernel@...r.kernel.org,
rcu@...r.kernel.org, linux-mm@...ck.org, lkmm@...ts.linux.dev,
"Paul E. McKenney" <paulmck@...nel.org>,
Frederic Weisbecker <frederic@...nel.org>,
Neeraj Upadhyay <neeraj.upadhyay@...nel.org>,
Joel Fernandes <joel@...lfernandes.org>,
Josh Triplett <josh@...htriplett.org>, Uladzislau Rezki <urezki@...il.com>,
Steven Rostedt <rostedt@...dmis.org>, Lai Jiangshan
<jiangshanlai@...il.com>, Zqiang <qiang.zhang1211@...il.com>,
Peter Zijlstra <peterz@...radead.org>, Ingo Molnar <mingo@...hat.com>,
Will Deacon <will@...nel.org>, Waiman Long <longman@...hat.com>,
Mark Rutland <mark.rutland@....com>, Thomas Gleixner <tglx@...utronix.de>,
Kent Overstreet <kent.overstreet@...il.com>, Vlastimil Babka
<vbabka@...e.cz>, maged.michael@...il.com,
Neeraj Upadhyay <neeraj.upadhyay@....com>
Subject: Re: [RFC PATCH 1/4] hazptr: Add initial implementation of hazard
pointers
Am 9/26/2024 um 6:12 PM schrieb Linus Torvalds:
> On Thu, 26 Sept 2024 at 08:54, Jonas Oberhauser
> <jonas.oberhauser@...weicloud.com> wrote:
>>
>> No, the issue introduced by the compiler optimization (or by your
>> original patch) is that the CPU can speculatively load from the first
>> pointer as soon as it has completed the load of that pointer:
>
> You mean the compiler can do it.
What I mean is that if we only use rcu_dereference for the second load
(and not either some form of compiler barrier or an acquire load), then
the compiler can transform the second program from my previous e-mail
(which if mapped 1:1 to hardware would be correct because hardware
ensures the ordering based on the address dependency) into the first one
(which is incorrect).
In particular, the compiler can change
if (node == node2) t = *node2;
into
if (node == node2) t = *node;
and then the CPU can speculatively read *node before knowing the value
of node2.
The compiler can also speculatively read *node in this case, but that is
not what I meant.
The code in Mathieu's original patch is already like the latter one and
is broken even if the compiler does not do any optimizations.
> The inline asm has no impact on what
> the CPU does. The conditional isn't a barrier for the actual hardware.
> But once the compiler doesn't try to do it, the data dependency on the
> address does end up being an ordering constraint on the hardware too
Exactly. The inline asm would prevent the compiler from doing the
transformation though, which would mean that the address dependency
appears in the final compiler output.
> Just use a barrier. Or make sure to use the proper ordered memory
> accesses when possible.
>
> Don't use an inline asm for the compare - we
> don't even have anything insane like that as a portable helper, and we
> shouldn't have it.
I'm glad you say that :))
I would also just use a barrier before returing the pointer.
Boqun seems to be unhappy with a barrier though, because it would
theoretically also forbid unrelated optimizations.
But I have not seen any evidence that there are any unrelated
optimizations going on in the first place that would be forbidden by this.
Have fun,
jonas
Powered by blists - more mailing lists