linux-kernel - Re: [RFC] Bridging the gap between the Linux Kernel Memory Consistency Model (LKMM) and C11/C++11 atomics

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230704094627.GS4253@hirez.programming.kicks-ass.net>
Date:   Tue, 4 Jul 2023 11:46:27 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Olivier Dion <odion@...icios.com>
Cc:     Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, rnk@...gle.com,
        Alan Stern <stern@...land.harvard.edu>,
        Andrea Parri <parri.andrea@...il.com>,
        Will Deacon <will@...nel.org>,
        Boqun Feng <boqun.feng@...il.com>,
        Nicholas Piggin <npiggin@...il.com>,
        David Howells <dhowells@...hat.com>,
        Jade Alglave <j.alglave@....ac.uk>,
        Luc Maranget <luc.maranget@...ia.fr>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        Nathan Chancellor <nathan@...nel.org>,
        Nick Desaulniers <ndesaulniers@...gle.com>,
        Tom Rix <trix@...hat.com>, linux-kernel@...r.kernel.org,
        linux-arch@...r.kernel.org, gcc@....gnu.org, llvm@...ts.linux.dev
Subject: Re: [RFC] Bridging the gap between the Linux Kernel Memory
 Consistency Model (LKMM) and C11/C++11 atomics

On Mon, Jul 03, 2023 at 03:20:31PM -0400, Olivier Dion wrote:

>   int x = 0;
>   int y = 0;
>   int r0, r1;
> 
>   int dummy;
> 
>   void t0(void)
>   {
>           __atomic_store_n(&x, 1, __ATOMIC_RELAXED);
> 
>           __atomic_exchange_n(&dummy, 1, __ATOMIC_SEQ_CST);
>           __atomic_thread_fence(__ATOMIC_SEQ_CST);
> 
>           r0 = __atomic_load_n(&y, __ATOMIC_RELAXED);
>   }
> 
>   void t1(void)
>   {
>           __atomic_store_n(&y, 1, __ATOMIC_RELAXED);
>           __atomic_thread_fence(__ATOMIC_SEQ_CST);
>           r1 = __atomic_load_n(&x, __ATOMIC_RELAXED);
>   }
> 
>   // BUG_ON(r0 == 0 && r1 == 0)
> 
> On x86-64 (gcc 13.1 -O2) we get:
> 
>   t0():
>           movl    $1, x(%rip)
>           movl    $1, %eax
>           xchgl   dummy(%rip), %eax
>           lock orq $0, (%rsp)       ;; Redundant with previous exchange.
>           movl    y(%rip), %eax
>           movl    %eax, r0(%rip)
>           ret
>   t1():
>           movl    $1, y(%rip)
>           lock orq $0, (%rsp)
>           movl    x(%rip), %eax
>           movl    %eax, r1(%rip)
>           ret

So I would expect the compilers to do better here. It should know those
__atomic_thread_fence() thingies are superfluous and simply not emit
them. This could even be done as a peephole pass later, where it sees
consecutive atomic ops and the second being a no-op.

> On x86-64 (clang 16 -O2) we get:
> 
>   t0():
>           movl    $1, x(%rip)
>           movl    $1, %eax
>           xchgl   %eax, dummy(%rip)
>           mfence                    ;; Redundant with previous exchange.

And that's just terrible :/ Nobody should be using MFENCE for this. And
using MFENCE after a LOCK prefixes instruction (implicit in this case)
is just fail, because I don't think C++ atomics cover MMIO and other
such 'lovely' things.

>           movl    y(%rip), %eax
>           movl    %eax, r0(%rip)
>           retq
>   t1():
>           movl    $1, y(%rip)
>           mfence
>           movl    x(%rip), %eax
>           movl    %eax, r1(%rip)
>           retq