[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210416160139.GF4212@paulmck-ThinkPad-P17-Gen-1>
Date: Fri, 16 Apr 2021 09:01:39 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
Will Deacon <will@...nel.org>,
linux-kernel <linux-kernel@...r.kernel.org>,
lttng-dev <lttng-dev@...ts.lttng.org>
Subject: Re: liburcu: LTO breaking rcu_dereference on arm64 and possibly
other architectures ?
On Fri, Apr 16, 2021 at 05:17:11PM +0200, Peter Zijlstra wrote:
> On Fri, Apr 16, 2021 at 10:52:16AM -0400, Mathieu Desnoyers wrote:
> > Hi Paul, Will, Peter,
> >
> > I noticed in this discussion https://lkml.org/lkml/2021/4/16/118 that LTO
> > is able to break rcu_dereference. This seems to be taken care of by
> > arch/arm64/include/asm/rwonce.h on arm64 in the Linux kernel tree.
> >
> > In the liburcu user-space library, we have this comment near rcu_dereference() in
> > include/urcu/static/pointer.h:
> >
> > * The compiler memory barrier in CMM_LOAD_SHARED() ensures that value-speculative
> > * optimizations (e.g. VSS: Value Speculation Scheduling) does not perform the
> > * data read before the pointer read by speculating the value of the pointer.
> > * Correct ordering is ensured because the pointer is read as a volatile access.
> > * This acts as a global side-effect operation, which forbids reordering of
> > * dependent memory operations. Note that such concern about dependency-breaking
> > * optimizations will eventually be taken care of by the "memory_order_consume"
> > * addition to forthcoming C++ standard.
> >
> > (note: CMM_LOAD_SHARED() is the equivalent of READ_ONCE(), but was introduced in
> > liburcu as a public API before READ_ONCE() existed in the Linux kernel)
> >
> > Peter tells me the "memory_order_consume" is not something which can be used today.
> > Any information on its status at C/C++ standard levels and implementation-wise ?
Actually, you really can use memory_order_consume. All current
implementations will compile it as if it was memory_order_acquire.
This will work correctly, but may be slower than you would like on ARM,
PowerPC, and so on.
On things like x86, the penalty is forgone optimizations, so less
of a problem there.
> > Pragmatically speaking, what should we change in liburcu to ensure we don't generate
> > broken code when LTO is enabled ? I suspect there are a few options here:
> >
> > 1) Fail to build if LTO is enabled,
> > 2) Generate slower code for rcu_dereference, either on all architectures or only
> > on weakly-ordered architectures,
> > 3) Generate different code depending on whether LTO is enabled or not. AFAIU this would only
> > work if every compile unit is aware that it will end up being optimized with LTO. Not sure
> > how this could be done in the context of user-space.
> > 4) [ Insert better idea here. ]
Use memory_order_consume if LTO is enabled. That will work now, and
might generate good code in some hoped-for future.
> > Thoughts ?
>
> Using memory_order_acquire is safe; and is basically what Will did for
> ARM64.
>
> The problematic tranformations are possible even without LTO, although
> less likely due to less visibility, but everybody agrees they're
> possible and allowed.
>
> OTOH we do not have a positive sighting of it actually happening (I
> think), we're all just being cautious and not willing to debug the
> resulting wreckage if it does indeed happen.
And yes, you can also use memory_order_acquire.
Thanx, Paul
Powered by blists - more mailing lists