[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230704094627.GS4253@hirez.programming.kicks-ass.net>
Date: Tue, 4 Jul 2023 11:46:27 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: Olivier Dion <odion@...icios.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, rnk@...gle.com,
Alan Stern <stern@...land.harvard.edu>,
Andrea Parri <parri.andrea@...il.com>,
Will Deacon <will@...nel.org>,
Boqun Feng <boqun.feng@...il.com>,
Nicholas Piggin <npiggin@...il.com>,
David Howells <dhowells@...hat.com>,
Jade Alglave <j.alglave@....ac.uk>,
Luc Maranget <luc.maranget@...ia.fr>,
"Paul E. McKenney" <paulmck@...nel.org>,
Nathan Chancellor <nathan@...nel.org>,
Nick Desaulniers <ndesaulniers@...gle.com>,
Tom Rix <trix@...hat.com>, linux-kernel@...r.kernel.org,
linux-arch@...r.kernel.org, gcc@....gnu.org, llvm@...ts.linux.dev
Subject: Re: [RFC] Bridging the gap between the Linux Kernel Memory
Consistency Model (LKMM) and C11/C++11 atomics
On Mon, Jul 03, 2023 at 03:20:31PM -0400, Olivier Dion wrote:
> int x = 0;
> int y = 0;
> int r0, r1;
>
> int dummy;
>
> void t0(void)
> {
> __atomic_store_n(&x, 1, __ATOMIC_RELAXED);
>
> __atomic_exchange_n(&dummy, 1, __ATOMIC_SEQ_CST);
> __atomic_thread_fence(__ATOMIC_SEQ_CST);
>
> r0 = __atomic_load_n(&y, __ATOMIC_RELAXED);
> }
>
> void t1(void)
> {
> __atomic_store_n(&y, 1, __ATOMIC_RELAXED);
> __atomic_thread_fence(__ATOMIC_SEQ_CST);
> r1 = __atomic_load_n(&x, __ATOMIC_RELAXED);
> }
>
> // BUG_ON(r0 == 0 && r1 == 0)
>
> On x86-64 (gcc 13.1 -O2) we get:
>
> t0():
> movl $1, x(%rip)
> movl $1, %eax
> xchgl dummy(%rip), %eax
> lock orq $0, (%rsp) ;; Redundant with previous exchange.
> movl y(%rip), %eax
> movl %eax, r0(%rip)
> ret
> t1():
> movl $1, y(%rip)
> lock orq $0, (%rsp)
> movl x(%rip), %eax
> movl %eax, r1(%rip)
> ret
So I would expect the compilers to do better here. It should know those
__atomic_thread_fence() thingies are superfluous and simply not emit
them. This could even be done as a peephole pass later, where it sees
consecutive atomic ops and the second being a no-op.
> On x86-64 (clang 16 -O2) we get:
>
> t0():
> movl $1, x(%rip)
> movl $1, %eax
> xchgl %eax, dummy(%rip)
> mfence ;; Redundant with previous exchange.
And that's just terrible :/ Nobody should be using MFENCE for this. And
using MFENCE after a LOCK prefixes instruction (implicit in this case)
is just fail, because I don't think C++ atomics cover MMIO and other
such 'lovely' things.
> movl y(%rip), %eax
> movl %eax, r0(%rip)
> retq
> t1():
> movl $1, y(%rip)
> mfence
> movl x(%rip), %eax
> movl %eax, r1(%rip)
> retq
Powered by blists - more mailing lists