linux-kernel - Re: [PATCH v2 2/2] tools/memory-model: Make ppo a subrelation of po

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y9lH97/Hw+W90gj0@rowland.harvard.edu>
Date:   Tue, 31 Jan 2023 11:55:19 -0500
From:   Alan Stern <stern@...land.harvard.edu>
To:     Jonas Oberhauser <jonas.oberhauser@...weicloud.com>
Cc:     Andrea Parri <parri.andrea@...il.com>, paulmck@...nel.org,
        will@...nel.org, peterz@...radead.org, boqun.feng@...il.com,
        npiggin@...il.com, dhowells@...hat.com, j.alglave@....ac.uk,
        luc.maranget@...ia.fr, akiyks@...il.com, dlustig@...dia.com,
        joel@...lfernandes.org, urezki@...il.com, quic_neeraju@...cinc.com,
        frederic@...nel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 2/2] tools/memory-model: Make ppo a subrelation of po

On Tue, Jan 31, 2023 at 04:33:25PM +0100, Jonas Oberhauser wrote:
> 
> 
> On 1/31/2023 4:06 PM, Alan Stern wrote:
> > On Tue, Jan 31, 2023 at 02:56:00PM +0100, Jonas Oberhauser wrote:
> > > I have some additional thoughts now. It seems that you could weaken the
> > > operational model by stating that an A-cumulative fence orders propagation
> > > of all *external* stores (in addition to all po-earlier stores) that
> > > propagated to you before the fence is executed.
> > How is that a weakening of the operational model?  It's what the
> > operational model says right now.
> 
> No, as in the part that you have quoted, it is stated that an A-cumulative
> fence orderes propagation of *all* stores that propagated to you before the
> fence is executed.
> I'm saying you could weaken this to only cover all *external* stores.

Okay, now I understand.

> More precisely, I would change
> 
> > 	For each other CPU C', any store which propagates to C before
> > 	a release fence is executed (including all po-earlier
> > 	stores executed on C) is forced to propagate to C' before the
> > 	store associated with the release fence does.
> 
> Into something like
> 
> 
>      For each other CPU C', any *external* store which propagates to C
> before
>      a release fence is executed as well as any po-earlier
>      store executed on C is forced to propagate to C' before the
>      store associated with the release fence does.
> 
> The difference is that po-later stores that happen to propagate to C before
> the release fence is executed would no longer be ordered.
> That should be consistent with the axiomatic model.

I had to check that it wouldn't affect the (xbstar & int) part of vis, 
but it looks all right.  This seems like a reasonable change.

However, it only fixes part of the problem.  Suppose an external write 
is read by an instruction po-after the release-store, but the read 
executes before the release-store.  The operational model would still 
say the external write has to obey the propagation ordering, whereas the 
formal model doesn't require it.

> > > > P0(int *x, int *y, int *z)
> > > > {
> > > > 	int r1;
> > > > 
> > > > 	r1 = READ_ONCE(*x);
> > > > 	smp_store_release(y, 1);
> > > > 	WRITE_ONCE(*z, 1);
> > > > }
> > > > 
> > > > P1(int *x, int *y, int *z)
> > > > {
> > > > 	int r2;
> > > > 
> > > > 	r2 = READ_ONCE(*z);
> > > > 	WRITE_ONCE(*x, r2);
> > > > }
> > > > 
> > > > P2(int *x, int *y, int *z)
> > > > {
> > > > 	int r3;
> > > > 	int r4;
> > > > 
> > > > 	r3 = READ_ONCE(*y);
> > > > 	smp_rmb();
> > > > 	r4 = READ_ONCE(*z);
> > > > }
> > > > 
> > > > exists (0:r1=1 /\ 2:r3=1 /\ 2:r4=0)
> > > I could imagine that P0 posts both of its stores in a shared store buffer
> > > before reading *x, but marks the release store as "not ready".
> > > Then P1 forwards *z=1 from the store buffer and posts *x=1, which P0 reads,
> > > and subsequently marks its release store as "ready".
> > That isn't how release stores are meant to work.  The read of x is
> > supposed to be complete before the release store becomes visible to any
> > other CPU.
> 
> Note that the release store isn't observed until it becomes "ready", so it
> is really indistinguishable of whether it had become visible to any other
> CPU.
> Indeed stores that aren't marked "ready" would be ignored during forwarding,
> and not allowed to be pushed to the cache.

Oops, I mixed up a couple of the accesses.  Okay, yes, this mechanism 
will allow writes that are po-after a release store but execute before 
it to evade the propagation restriction.

> The reason this kind of implementation seems less natural to me is that such
> an "not ready" store would need to be pushed back in the buffer (if it is
> the head of the buffer and the cache is ready to take a store), stall the
> later stores, or be aborted until it becomes ready.
> That just seems to create a lot of hassle for no discernible benefit.
> A "not ready" store probably shouldn't be put into a store queue, even if
> the only reason it is not ready is that there are some otherwise unrelated
> reads that haven't completed yet.
> 
> 
> 
> > This is true even in C11.
> 
> Arguable... The following pseudo-code litmus test should demonstrate this:
> 
> P0 {
>    int r = read_relaxed(&x);
>    store_release(&y,1);
> }
> 
> 
> P1 {
>    int s = read_relaxed(&y);
>    store_release(&x,1);
> }
> 
> In C11, it should be possible to read r==s==1.

True, in C11 releases don't mean anything unless they're paired with 
acquires.  But if your P1 had been

	int s = read_acquire(&y);
	write_relaxed(&x, 1);

then r = s = 1 would not be allowed.  And presumably the same object 
code would be generated for P0 either way, particularly if P1 was in a 
separate compilation unit (link-time optimization notwithstanding).

> Btw, how to proceed for your SRCU patch and this one?
> Are you planning to make any changes? I think the version you have is ok if
> you don't think the patch is improved by anything I brought up.

I don't see any need to change the SRCU patch at this point, other than 
to improve the attributions.

> Any additional concerns/changes for this patch?

It should give the same data-race diagnostics as the current LKMM.  This 
probably means the patch will need to punch up the definitions of 
*-pre-bounded and *-post-bounded, unless you can think of a better 
approach.

Alan