lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAH6eHdQQWO2AYQRXnAATt6nvcyDjKj-_5Ktt2ze6F158hBon=Q@mail.gmail.com>
Date:   Tue, 4 Jul 2023 11:23:46 +0100
From:   Jonathan Wakely <jwakely.gcc@...il.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Olivier Dion <odion@...icios.com>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        rnk@...gle.com, Alan Stern <stern@...land.harvard.edu>,
        Andrea Parri <parri.andrea@...il.com>,
        Will Deacon <will@...nel.org>,
        Boqun Feng <boqun.feng@...il.com>,
        Nicholas Piggin <npiggin@...il.com>,
        David Howells <dhowells@...hat.com>,
        Jade Alglave <j.alglave@....ac.uk>,
        Luc Maranget <luc.maranget@...ia.fr>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        Nathan Chancellor <nathan@...nel.org>,
        Nick Desaulniers <ndesaulniers@...gle.com>,
        Tom Rix <trix@...hat.com>, linux-kernel@...r.kernel.org,
        linux-arch@...r.kernel.org, gcc@....gnu.org, llvm@...ts.linux.dev
Subject: Re: [RFC] Bridging the gap between the Linux Kernel Memory
 Consistency Model (LKMM) and C11/C++11 atomics

On Tue, 4 Jul 2023 at 10:47, Peter Zijlstra wrote:
>
> On Mon, Jul 03, 2023 at 03:20:31PM -0400, Olivier Dion wrote:
>
> >   int x = 0;
> >   int y = 0;
> >   int r0, r1;
> >
> >   int dummy;
> >
> >   void t0(void)
> >   {
> >           __atomic_store_n(&x, 1, __ATOMIC_RELAXED);
> >
> >           __atomic_exchange_n(&dummy, 1, __ATOMIC_SEQ_CST);
> >           __atomic_thread_fence(__ATOMIC_SEQ_CST);
> >
> >           r0 = __atomic_load_n(&y, __ATOMIC_RELAXED);
> >   }
> >
> >   void t1(void)
> >   {
> >           __atomic_store_n(&y, 1, __ATOMIC_RELAXED);
> >           __atomic_thread_fence(__ATOMIC_SEQ_CST);
> >           r1 = __atomic_load_n(&x, __ATOMIC_RELAXED);
> >   }
> >
> >   // BUG_ON(r0 == 0 && r1 == 0)
> >
> > On x86-64 (gcc 13.1 -O2) we get:
> >
> >   t0():
> >           movl    $1, x(%rip)
> >           movl    $1, %eax
> >           xchgl   dummy(%rip), %eax
> >           lock orq $0, (%rsp)       ;; Redundant with previous exchange.
> >           movl    y(%rip), %eax
> >           movl    %eax, r0(%rip)
> >           ret
> >   t1():
> >           movl    $1, y(%rip)
> >           lock orq $0, (%rsp)
> >           movl    x(%rip), %eax
> >           movl    %eax, r1(%rip)
> >           ret
>
> So I would expect the compilers to do better here. It should know those
> __atomic_thread_fence() thingies are superfluous and simply not emit
> them. This could even be done as a peephole pass later, where it sees
> consecutive atomic ops and the second being a no-op.

Right, I don't see why we need a whole set of new built-ins that say
"this fence isn't needed if the adjacent atomic op already implies a
fence". If the adjacent atomic op already implies a fence for a given
ISA, then the compiler should already be able to elide the explicit
fence.

So just write your code with the explicit fence, and rely on the
compiler to optimize it properly. Admittedly, today's compilers don't
do that optimization well, but they also don't support your proposed
built-ins, so you're going to have to wait for compilers to make
improvements either way.

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4455.html
discusses that compilers could (and should) optimize around atomics
better.

>
> > On x86-64 (clang 16 -O2) we get:
> >
> >   t0():
> >           movl    $1, x(%rip)
> >           movl    $1, %eax
> >           xchgl   %eax, dummy(%rip)
> >           mfence                    ;; Redundant with previous exchange.
>
> And that's just terrible :/ Nobody should be using MFENCE for this. And
> using MFENCE after a LOCK prefixes instruction (implicit in this case)
> is just fail, because I don't think C++ atomics cover MMIO and other
> such 'lovely' things.
>
> >           movl    y(%rip), %eax
> >           movl    %eax, r0(%rip)
> >           retq
> >   t1():
> >           movl    $1, y(%rip)
> >           mfence
> >           movl    x(%rip), %eax
> >           movl    %eax, r1(%rip)
> >           retq
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ