[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zxor8xosL-XSxnwr@andrea>
Date: Thu, 24 Oct 2024 14:13:55 +0300
From: Andrea Parri <parri.andrea@...il.com>
To: "Paul E. McKenney" <paulmck@...nel.org>
Cc: puranjay@...nel.org, bpf@...r.kernel.org, lkmm@...ts.linux.dev,
linux-kernel@...r.kernel.org
Subject: Re: Some observations (results) on BPF acquire and release
On Wed, Oct 23, 2024 at 09:25:40PM -0700, Paul E. McKenney wrote:
> On Wed, Oct 23, 2024 at 08:47:44PM +0300, Andrea Parri wrote:
> > Hi Puranjay and Paul,
> >
> > I'm running some experiment on the (experimental) formalization of BPF
> > acquire and release available from [1] and wanted to report about some
> > (initial) observations for discussion and possibly future developments;
> > apologies in advance for the relatively long email and any repetition.
> >
> >
> > A first and probably most important observation is that the (current)
> > formalization of acquire and release appears to be "too strong": IIUC,
> > the simplest example/illustration for this is given by the following
> >
> > BPF R+release+fence
> > {
> > 0:r2=x; 0:r4=y;
> > 1:r2=y; 1:r4=x; 1:r6=l;
> > }
> > P0 | P1 ;
> > r1 = 1 | r1 = 2 ;
> > *(u32 *)(r2 + 0) = r1 | *(u32 *)(r2 + 0) = r1 ;
> > r3 = 1 | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) ;
> > store_release((u32 *)(r4 + 0), r3) | r3 = *(u32 *)(r4 + 0) ;
> > exists ([y]=2 /\ 1:r3=0)
> >
> > This "exists" condition is not satisfiable according to the BPF model;
> > however, if we adopt the "natural"/intended(?) PowerPC implementations
> > of the synchronization primitives above (aka, with store_release() -->
> > LWSYNC and atomic_fetch_add() --> SYNC ; [...] ), then we see that the
> > condition in question becomes (architecturally) satisfiable on PowerPC
> > (although I'm not aware of actual observations on PowerPC hardware).
>
> Yes, you are quite right, for efficient use on PowerPC, we need the BPF
> memory model to allow the above cycle in the R litmus test. My bad,
> as I put too much emphasis on ARM64.
>
> > At first, the previous observation (validated via simulations and later
> > extended to similar but more complex scenarios ) made me believe that
> > the BPF formalization of acquire and release could be strictly stronger
> > than the corresponding LKMM formalization; but that is _not_ the case:
> >
> > The following "exists" condition is satisfiable according to the BPF
> > model (and it remains satisfiable even if the load_acquire() in P2 is
> > paired with an additional store_release() in P1). In contrast, the
> > corresponding LKMM condition (e.g load_acquire() --> smp_load_acquire()
> > and atomic_fetch_add() --> smp_mb()) is not satisfiable (in fact, the
> > same conclusion holds even if the putative smp_load_acquire() in P2 is
> > "replaced" with an smp_rmb() or with an address dependency).
> >
> > BPF Z6.3+fence+fence+acquire
> > {
> > 0:r2=x; 0:r4=y; 0:r6=l;
> > 1:r2=y; 1:r4=z; 1:r6=m;
> > 2:r2=z; 2:r4=x;
> > }
> > P0 | P1 | P2 ;
> > r1 = 1 | r1 = 2 | r1 = load_acquire((u32 *)(r2 + 0)) ;
> > *(u32 *)(r2 + 0) = r1 | *(u32 *)(r2 + 0) = r1 | r3 = *(u32 *)(r4 + 0) ;
> > r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) | ;
> > r3 = 1 | r3 = 1 | ;
> > *(u32 *)(r4 + 0) = r3 | *(u32 *)(r4 + 0) = r3 | ;
> > exists ([y]=2 /\ 2:r1=1 /\ 2:r3=0)
>
> And again agreed, we do want to forbid Z6.3.
>
> > These remarks show that the proposed BPF formalization of acquire and
> > release somehow, but substantially, diverged from the corresponding
> > LKMM formalization. My guess is that the divergences mentioned above
> > were not (fully) intentional, or I'm wondering -- why not follow the
> > latter (the LKMM's) more closely? - This is probably the first question
> > I would need to clarify before trying/suggesting modifications to the
> > present formalizations. ;-) Thoughts?
>
> Thank you for digging into this!
>
> I clearly need to get my validation work going again, but I very much
> welcome any further help you would be willing to provide.
Thanks for the confirmation.
The BPF tests above (and other I have) were all hand-written, but I'm
working towards improving such automation/validation; I won't keep it
a secret should I find something relevant/interesting. :-)
But the subset of the LKMM which deals with "strong fences" and Acq &
Rel (limited to so called marked accesses) seems relatively contained
/simple: its analysis could be useful, if not determining, in trying
to resolve the above issues.
Andrea
Powered by blists - more mailing lists