[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <daa60273-d01a-8fc5-5e26-e8fc9364c1d8@huaweicloud.com>
Date: Fri, 25 Oct 2024 11:32:20 +0200
From: Hernan Ponce de Leon <hernan.poncedeleon@...weicloud.com>
To: Andrea Parri <parri.andrea@...il.com>, puranjay@...nel.org,
paulmck@...nel.org
Cc: bpf@...r.kernel.org, lkmm@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: Some observations (results) on BPF acquire and release
On 10/23/2024 7:47 PM, Andrea Parri wrote:
> Hi Puranjay and Paul,
>
> I'm running some experiment on the (experimental) formalization of BPF
> acquire and release available from [1] and wanted to report about some
> (initial) observations for discussion and possibly future developments;
> apologies in advance for the relatively long email and any repetition.
>
>
> A first and probably most important observation is that the (current)
> formalization of acquire and release appears to be "too strong": IIUC,
> the simplest example/illustration for this is given by the following
>
> BPF R+release+fence
> {
> 0:r2=x; 0:r4=y;
> 1:r2=y; 1:r4=x; 1:r6=l;
> }
> P0 | P1 ;
> r1 = 1 | r1 = 2 ;
> *(u32 *)(r2 + 0) = r1 | *(u32 *)(r2 + 0) = r1 ;
> r3 = 1 | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) ;
> store_release((u32 *)(r4 + 0), r3) | r3 = *(u32 *)(r4 + 0) ;
> exists ([y]=2 /\ 1:r3=0)
>
> This "exists" condition is not satisfiable according to the BPF model;
> however, if we adopt the "natural"/intended(?) PowerPC implementations
> of the synchronization primitives above (aka, with store_release() -->
> LWSYNC and atomic_fetch_add() --> SYNC ; [...] ), then we see that the
> condition in question becomes (architecturally) satisfiable on PowerPC
> (although I'm not aware of actual observations on PowerPC hardware).
Are the resulting PPC tests available somewhere?
>
>
> At first, the previous observation (validated via simulations and later
> extended to similar but more complex scenarios ) made me believe that
> the BPF formalization of acquire and release could be strictly stronger
> than the corresponding LKMM formalization; but that is _not_ the case:
>
> The following "exists" condition is satisfiable according to the BPF
> model (and it remains satisfiable even if the load_acquire() in P2 is
> paired with an additional store_release() in P1). In contrast, the
> corresponding LKMM condition (e.g load_acquire() --> smp_load_acquire()
> and atomic_fetch_add() --> smp_mb()) is not satisfiable (in fact, the
> same conclusion holds even if the putative smp_load_acquire() in P2 is
> "replaced" with an smp_rmb() or with an address dependency).
>
> BPF Z6.3+fence+fence+acquire
> {
> 0:r2=x; 0:r4=y; 0:r6=l;
> 1:r2=y; 1:r4=z; 1:r6=m;
> 2:r2=z; 2:r4=x;
> }
> P0 | P1 | P2 ;
> r1 = 1 | r1 = 2 | r1 = load_acquire((u32 *)(r2 + 0)) ;
> *(u32 *)(r2 + 0) = r1 | *(u32 *)(r2 + 0) = r1 | r3 = *(u32 *)(r4 + 0) ;
> r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) | r5 = atomic_fetch_add((u32 *)(r6 + 0), r5) | ;
> r3 = 1 | r3 = 1 | ;
> *(u32 *)(r4 + 0) = r3 | *(u32 *)(r4 + 0) = r3 | ;
> exists ([y]=2 /\ 2:r1=1 /\ 2:r3=0)
>
>
> These remarks show that the proposed BPF formalization of acquire and
> release somehow, but substantially, diverged from the corresponding
> LKMM formalization. My guess is that the divergences mentioned above
> were not (fully) intentional, or I'm wondering -- why not follow the
> latter (the LKMM's) more closely? - This is probably the first question
> I would need to clarify before trying/suggesting modifications to the
> present formalizations. ;-) Thoughts?
>
> Andrea
>
>
> [1] https://github.com/puranjaymohan/herdtools7/commits/bpf_acquire_release/
Powered by blists - more mailing lists