linux-kernel - RE: "Verifying and Optimizing Compact NUMA-Aware Locks on Weak Memory Models"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7ad2354bf993435b917f278d4199a6ff@huawei.com>
Date:   Mon, 12 Sep 2022 11:10:17 +0000
From:   Hernan Luis Ponce de Leon <hernanl.leon@...wei.com>
To:     Jonas Oberhauser <jonas.oberhauser@...wei.com>,
        Joel Fernandes <joel@...lfernandes.org>
CC:     Alan Stern <stern@...land.harvard.edu>,
        Boqun Feng <boqun.feng@...il.com>,
        Peter Zijlstra <peterz@...radead.org>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        "parri.andrea@...il.com" <parri.andrea@...il.com>,
        "will@...nel.org" <will@...nel.org>,
        "npiggin@...il.com" <npiggin@...il.com>,
        "dhowells@...hat.com" <dhowells@...hat.com>,
        "j.alglave@....ac.uk" <j.alglave@....ac.uk>,
        "luc.maranget@...ia.fr" <luc.maranget@...ia.fr>,
        "akiyks@...il.com" <akiyks@...il.com>,
        "dlustig@...dia.com" <dlustig@...dia.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>
Subject: RE: "Verifying and Optimizing Compact NUMA-Aware Locks on Weak Memory
 Models"

> Therefore this hang should be observable on a hypothetical LKMM processor
> which makes use of all the relaxed liberty the LKMM allows. However according
> to the authors of that paper (who are my colleagues but I haven't been involved
> deeply in that work), not even Power+gcc allow this reordering to happen, and if
> that's true it is probably because the wmb is mapped to lwsync which is fully
> cumulative in Power but not in LKMM.

All the "issues" we mention in the technical report are according to LKMM.
As shown by (*) below, as soon as the code gets compiled and verified against the 
corresponding hardware memory model, the code is correct.

Here is a small variant of the litmus test I sent earlier where not only the "problematic 
behavior" is allowed by LKMM, but where liveness is actually violated.
The code is written in C (main function and headers missing) and cannot be used directly 
with herd7 (since I am not sure if the end of thread_3 can be written using herd7 syntax).

------------------------------------------------------------------------
int y, z;
atomic_t x;

void *thread_1(void *unused)
{   
    // clear_pending_set_locked
    int r0 = atomic_fetch_add(2,&x) ;
}

void *thread_2(void *unused)
{
    // this store breaks liveness
    WRITE_ONCE(y, 1);
    // queued_spin_trylock
    int r0 = atomic_read(&x);
    // barrier after the initialisation of nodes
    smp_wmb();
    // xchg_tail
    int r1 = atomic_cmpxchg_relaxed(&x,r0,42);
    // link node into the waitqueue
    WRITE_ONCE(z, 1);
}

void *thread_3(void *unused)
{
    // node initialisation
    WRITE_ONCE(z, 2);
    // queued_spin_trylock
    int r0 = atomic_read(&x);
    // barrier after the initialisation of nodes
    smp_wmb();
    // if we read z==2 we expect to read this store
    WRITE_ONCE(y, 0);
    // xchg_tail
    int r1 = atomic_cmpxchg_relaxed(&x,r0,24);
    // spinloop
    while(READ_ONCE(y) == 1 && (READ_ONCE(z) == 2)) {}
}
------------------------------------------------------------------------

Liveness is violated (following Theorem 5.3 of the "Making weak memory models fair" paper) because the reads from the spinloop 
can get their values from writes which come last in the coherence / modification order, and those values do not stop the spinning.

------------------------------------------------------------------------
$ java -jar $DAT3M_HOME/dartagnan/target/dartagnan-3.1.0.jar cat/linux-kernel.cat --target=lkmm --property=liveness liveness.c
...
Liveness violation found
FAIL
------------------------------------------------------------------------

(*) However, if the code is compiled (this transformation is done automatically and internally by the tool, notice the --target option) 
and we use some hardware memory model, the tool says the code is correct

------------------------------------------------------------------------
$ java -jar $DAT3M_HOME/dartagnan/target/dartagnan-3.1.0.jar cat/aarch64.cat --target=arm8 --property=liveness liveness.c
...
PASS

$ java -jar $DAT3M_HOME/dartagnan/target/dartagnan-3.1.0.jar cat/power.cat --target=power --property=liveness liveness.c
...
PASS

$ java -jar $DAT3M_HOME/dartagnan/target/dartagnan-3.1.0.jar cat/riscv.cat --target=riscv --property=liveness liveness.c
...
PASS
------------------------------------------------------------------------

I think it is somehow possible to show the liveness violation using herd7 and the following variant of the code

------------------------------------------------------------------------
C Liveness
{
  atomic_t x = ATOMIC_INIT(0);
  atomic_t y = ATOMIC_INIT(0);
}


P0(atomic_t *x) {
  // clear_pending_set_locked
  int r0 = atomic_fetch_add(2,x) ;
}

P1(atomic_t *x, int *z, int *y) {
  // this store breaks liveness
  WRITE_ONCE(*y, 1);
  // queued_spin_trylock
  int r0 = atomic_read(x);
  // barrier after the initialisation of nodes
  smp_wmb();
  // xchg_tail
  int r1 = atomic_cmpxchg_relaxed(x,r0,42);
  // link node into the waitqueue
  WRITE_ONCE(*z, 1);
}

P2(atomic_t *x,int *z, int *y) {
  // node initialisation
  WRITE_ONCE(*z, 2);
  // queued_spin_trylock
  int r0 = atomic_read(x);
  // barrier after the initialisation of nodes
  smp_wmb();
  // if we read z==2 we expect to read this store
  WRITE_ONCE(*y, 0);
  // xchg_tail
  int r1 = atomic_cmpxchg_relaxed(x,r0,24);
  // spinloop
  int r2 = READ_ONCE(*y);  
  int r3 = READ_ONCE(*z);  
}

exists (z=2 /\ y=1 /\ 2:r2=1 /\ 2:r3=2)
------------------------------------------------------------------------

Condition "2:r3=2" forces the spinloop to read from the first write in P2 and "z=2" forces this write 
to be last in the coherence order. Conditions "2:r2=1" and "y=1" force the same for the other read.
herd7 says this behavior is allowed by LKMM, showing that liveness can be violated.

In all the examples above, if we use mb() instead of wmb(), LKMM does not accept
the behavior and thus liveness is guaranteed.

Hernan