linux-kernel - Re: Internal vs. external barriers (was: Re: Interesting LKMM litmus test)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230125194651.GN2948950@paulmck-ThinkPad-P17-Gen-1>
Date:   Wed, 25 Jan 2023 11:46:51 -0800
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Alan Stern <stern@...land.harvard.edu>
Cc:     Jonas Oberhauser <jonas.oberhauser@...weicloud.com>,
        Andrea Parri <parri.andrea@...il.com>,
        Jonas Oberhauser <jonas.oberhauser@...wei.com>,
        Peter Zijlstra <peterz@...radead.org>, will <will@...nel.org>,
        "boqun.feng" <boqun.feng@...il.com>, npiggin <npiggin@...il.com>,
        dhowells <dhowells@...hat.com>,
        "j.alglave" <j.alglave@....ac.uk>,
        "luc.maranget" <luc.maranget@...ia.fr>, akiyks <akiyks@...il.com>,
        dlustig <dlustig@...dia.com>, joel <joel@...lfernandes.org>,
        urezki <urezki@...il.com>,
        quic_neeraju <quic_neeraju@...cinc.com>,
        frederic <frederic@...nel.org>,
        Kernel development list <linux-kernel@...r.kernel.org>
Subject: Re: Internal vs. external barriers (was: Re: Interesting LKMM litmus
 test)

On Wed, Jan 25, 2023 at 02:08:59PM -0500, Alan Stern wrote:
> On Wed, Jan 25, 2023 at 09:18:32AM -0800, Paul E. McKenney wrote:
> > ------------------------------------------------------------------------
> > 
> > C C-srcu-observed-4
> > 
> > (*
> >  * Result: Sometimes
> >  *
> >  * The Linux-kernel implementation is suspected to forbid this.
> >  *)
> > 
> > {}
> > 
> > P0(int *x, int *y, int *z, struct srcu_struct *s)
> > {
> > 	int r1;
> > 
> > 	r1 = srcu_read_lock(s);
> > 	WRITE_ONCE(*y, 2);
> > 	WRITE_ONCE(*x, 1);
> > 	srcu_read_unlock(s, r1);
> > }
> > 
> > P1(int *x, int *y, int *z, struct srcu_struct *s)
> > {
> > 	int r1;
> > 
> > 	WRITE_ONCE(*y, 1);
> > 	synchronize_srcu(s);
> > 	WRITE_ONCE(*z, 2);
> > }
> > 
> > P2(int *x, int *y, int *z, struct srcu_struct *s)
> > {
> > 	WRITE_ONCE(*z, 1);
> > 	smp_store_release(x, 2);
> > }
> > 
> > exists (x=1 /\ y=1 /\ z=1)
> > 
> > ------------------------------------------------------------------------
> > 
> > We get the following from herd7:
> > 
> > ------------------------------------------------------------------------
> > 
> > $ herd7 -conf linux-kernel.cfg C-srcu-observed-4.litmus
> > Test C-srcu-observed-4 Allowed
> > States 8
> > x=1; y=1; z=1;
> > x=1; y=1; z=2;
> > x=1; y=2; z=1;
> > x=1; y=2; z=2;
> > x=2; y=1; z=1;
> > x=2; y=1; z=2;
> > x=2; y=2; z=1;
> > x=2; y=2; z=2;
> > Ok
> > Witnesses
> > Positive: 1 Negative: 7
> > Condition exists (x=1 /\ y=1 /\ z=1)
> > Observation C-srcu-observed-4 Sometimes 1 7
> > Time C-srcu-observed-4 0.02
> > Hash=8b6020369b73ac19070864a9db00bbf8
> > 
> > ------------------------------------------------------------------------
> > 
> > This does not seem to me to be consistent with your "The RCU guarantee
> > about writes in a read-side critical section becoming visible to all
> > CPUs before a later grace period ends".
> 
> Let's see.  That guarantee requires only that x=1 and y=2 become visible 
> to P1 and P2 before the grace period ends.  And since synchronize_srcu 
> is a strong fence, y=1 must become visible to P0 and P2 before the grace 
> period ends.  Presumably after y=2 does, because it overwrites y=2.  
> Okay so far.
> 
> Now at some point P2 executes x=2.  If this were to happen after the 
> grace period ended, it would overwrite x=1.  Therefore it must happen 
> before the grace period ends, and therefore P2 must also write z=1 
> before the grace period ends.
> 
> So we have P2 writing z=1 before P1 writes z=2.  But this doesn't mean 
> z=2 has to overwrite z=1!  (You had a diagram illustrating this point in 
> one of your own slides for a talk about the LKMM.)  Overwriting is 
> required only when the earlier write becomes visible to the later 
> write's CPU before the later write occurs, and nothing in this test 
> forces z=2 to propagate to P1 before the z=1 write executes.
> 
> So the litmus test's outcome can happen without violating my guarantee.

Makes sense, thank you!

> > So what am I missing here?
> 
> Can't tell.  I'm not sure why you think the litmus test isn't consistent 
> with the guarantee.

I was missing that additional non-temporal co link.

> > Again, I am OK with LKMM allowing C-srcu-observed-4.litmus, as long as
> > the actual Linux-kernel implementation forbids it.
> 
> Why do you want the implementation to forbid it?  The pattern of the 
> litmus test resembles 3+3W, and you don't care whether the kernel allows 
> that pattern.  Do you?

Jonas asked a similar question, so I am answering you both here.

With (say) a release-WRITE_ONCE() chain implementing N+2W for some
N, it is reasonably well known that you don't get ordering, hardware
support otwithstanding.  After all, none of the Linux kernel, C, and C++
memory models make that guarantee.  In addition, the non-RCU barriers
and accesses that you can use to create N+2W have been in very wide use
for a very long time.

Although RCU has been in use for almost as long as those non-RCU barriers,
it has not been in wide use for anywhere near that long.  So I cannot
be so confident in ruling out some N+2W use case for RCU.

Such a use case could play out as follows:

1.	They try LKMM on it, see that LKMM allows it, and therefore find
	something else that works just as well.  This is fine.

2.	They try LKMM on it, see that LKMM allows it, but cannot find
	something else that works just as well.  They complain to us,
	and we either show them how to get the same results some other
	way or adjust LKMM (and perhaps the implementations) accordingly.
	These are also fine.

3.	They don't try LKMM on it, see that it works when they test it,
	and they send it upstream.  The use case is entangled deeply
	enough in other code that no one spots it on review.  The Linux
	kernel unconditionally prohibits the cycle.  This too is fine.

4.	They don't try LKMM on it, see that it works when they test it,
	and they send it upstream.  The use case is entangled deeply
	enough in other code that no one spots it on review.  Because RCU
	grace periods incur tens of microseconds of latency at a minimum,
	all tests (almost) always pass, just due to delays and unrelated
	accesses and memory barriers.  Even in kernels built with some
	future SRCU equivalent of CONFIG_RCU_STRICT_GRACE_PERIOD=y.
	But the Linux kernel allows the cycle when there is a new moon
	on Tuesday during a triple solar eclipse of Jupiter, a condition
	that is eventually met, and at the worst possible time and place.

	This is absolutely the opposite of fine.

I don't want to deal with #4.  So this is an RCU-maintainer use case
that I would like to avoid.  ;-)

							Thanx, Paul