linux-kernel - Re: Some observations (results) on BPF acquire and release

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zxor8xosL-XSxnwr@andrea>
Date: Thu, 24 Oct 2024 14:13:55 +0300
From: Andrea Parri <parri.andrea@...il.com>
To: "Paul E. McKenney" <paulmck@...nel.org>
Cc: puranjay@...nel.org, bpf@...r.kernel.org, lkmm@...ts.linux.dev,
	linux-kernel@...r.kernel.org
Subject: Re: Some observations (results) on BPF acquire and release

On Wed, Oct 23, 2024 at 09:25:40PM -0700, Paul E. McKenney wrote:
> On Wed, Oct 23, 2024 at 08:47:44PM +0300, Andrea Parri wrote:
> > Hi Puranjay and Paul,
> > 
> > I'm running some experiment on the (experimental) formalization of BPF
> > acquire and release available from [1] and wanted to report about some
> > (initial) observations for discussion and possibly future developments;
> > apologies in advance for the relatively long email and any repetition.
> > 
> > 
> > A first and probably most important observation is that the (current)
> > formalization of acquire and release appears to be "too strong": IIUC,
> > the simplest example/illustration for this is given by the following
> > 
> > BPF R+release+fence
> > {
> >  0:r2=x; 0:r4=y;
> >  1:r2=y; 1:r4=x; 1:r6=l;
> > }
> >  P0                                 | P1                                         ;
> >  r1 = 1                             | r1 = 2                                     ;
> >  *(u32 *)(r2 + 0) = r1              | *(u32 *)(r2 + 0) = r1                      ;
> >  r3 = 1                             | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) ;
> >  store_release((u32 *)(r4 + 0), r3) | r3 = *(u32 *)(r4 + 0)                      ;
> > exists ([y]=2 /\ 1:r3=0)
> > 
> > This "exists" condition is not satisfiable according to the BPF model;
> > however, if we adopt the "natural"/intended(?) PowerPC implementations
> > of the synchronization primitives above (aka, with store_release() -->
> > LWSYNC and atomic_fetch_add() --> SYNC ; [...] ), then we see that the
> > condition in question becomes (architecturally) satisfiable on PowerPC
> > (although I'm not aware of actual observations on PowerPC hardware).
> 
> Yes, you are quite right, for efficient use on PowerPC, we need the BPF
> memory model to allow the above cycle in the R litmus test.  My bad,
> as I put too much emphasis on ARM64.
> 
> > At first, the previous observation (validated via simulations and later
> > extended to similar but more complex scenarios ) made me believe that
> > the BPF formalization of acquire and release could be strictly stronger
> > than the corresponding LKMM formalization; but that is _not_ the case:
> > 
> > The following "exists" condition is satisfiable according to the BPF
> > model (and it remains satisfiable even if the load_acquire() in P2 is
> > paired with an additional store_release() in P1).  In contrast, the
> > corresponding LKMM condition (e.g load_acquire() --> smp_load_acquire()
> > and atomic_fetch_add() --> smp_mb()) is not satisfiable (in fact, the
> > same conclusion holds even if the putative smp_load_acquire() in P2 is
> > "replaced" with an smp_rmb() or with an address dependency).
> > 
> > BPF Z6.3+fence+fence+acquire
> > {
> >  0:r2=x; 0:r4=y; 0:r6=l;
> >  1:r2=y; 1:r4=z; 1:r6=m;
> >  2:r2=z; 2:r4=x;
> > }
> >  P0                                         | P1                                         | P2                                 ;
> >  r1 = 1                                     | r1 = 2                                     | r1 = load_acquire((u32 *)(r2 + 0)) ;
> >  *(u32 *)(r2 + 0) = r1                      | *(u32 *)(r2 + 0) = r1                      | r3 = *(u32 *)(r4 + 0)              ;
> >  r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) |                                    ;
> >  r3 = 1                                     | r3 = 1                                     |                                    ;
> >  *(u32 *)(r4 + 0) = r3                      | *(u32 *)(r4 + 0) = r3                      |                                    ;
> > exists ([y]=2 /\ 2:r1=1 /\ 2:r3=0)
> 
> And again agreed, we do want to forbid Z6.3.
> 
> > These remarks show that the proposed BPF formalization of acquire and
> > release somehow, but substantially, diverged from the corresponding
> > LKMM formalization.  My guess is that the divergences mentioned above
> > were not (fully) intentional, or I'm wondering -- why not follow the
> > latter (the LKMM's) more closely? -  This is probably the first question
> > I would need to clarify before trying/suggesting modifications to the
> > present formalizations.  ;-)  Thoughts?
> 
> Thank you for digging into this!
> 
> I clearly need to get my validation work going again, but I very much
> welcome any further help you would be willing to provide.

Thanks for the confirmation.

The BPF tests above (and other I have) were all hand-written, but I'm
working towards improving such automation/validation; I won't keep it
a secret should I find something relevant/interesting.  :-)

But the subset of the LKMM which deals with "strong fences" and Acq &
Rel (limited to so called marked accesses) seems relatively contained
/simple:  its analysis could be useful, if not determining, in trying
to resolve the above issues.

  Andrea