linux-kernel - Re: Internal vs. external barriers (was: Re: Interesting LKMM litmus test)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230121201026.GH2948950@paulmck-ThinkPad-P17-Gen-1>
Date:   Sat, 21 Jan 2023 12:10:26 -0800
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Alan Stern <stern@...land.harvard.edu>
Cc:     Jonas Oberhauser <jonas.oberhauser@...weicloud.com>,
        Andrea Parri <parri.andrea@...il.com>,
        Jonas Oberhauser <jonas.oberhauser@...wei.com>,
        Peter Zijlstra <peterz@...radead.org>, will <will@...nel.org>,
        "boqun.feng" <boqun.feng@...il.com>, npiggin <npiggin@...il.com>,
        dhowells <dhowells@...hat.com>,
        "j.alglave" <j.alglave@....ac.uk>,
        "luc.maranget" <luc.maranget@...ia.fr>, akiyks <akiyks@...il.com>,
        dlustig <dlustig@...dia.com>, joel <joel@...lfernandes.org>,
        urezki <urezki@...il.com>,
        quic_neeraju <quic_neeraju@...cinc.com>,
        frederic <frederic@...nel.org>,
        Kernel development list <linux-kernel@...r.kernel.org>
Subject: Re: Internal vs. external barriers (was: Re: Interesting LKMM litmus
 test)

On Sat, Jan 21, 2023 at 02:56:57PM -0500, Alan Stern wrote:
> On Sat, Jan 21, 2023 at 10:40:32AM -0800, Paul E. McKenney wrote:
> > On Sat, Jan 21, 2023 at 12:36:26PM -0500, Alan Stern wrote:
> > > On Fri, Jan 20, 2023 at 10:41:14PM +0100, Jonas Oberhauser wrote:
> > > > 
> > > > 
> > > > On 1/20/2023 5:18 PM, Alan Stern wrote:
> > > > > On Fri, Jan 20, 2023 at 11:13:00AM +0100, Jonas Oberhauser wrote:
> > > > > > Perhaps we could say that reading an index without using it later is
> > > > > > forbidden?
> > > > > > 
> > > > > > flag ~empty [Srcu-lock];data;rf;[~ domain(data;[Srcu-unlock])] as
> > > > > > thrown-srcu-cookie-on-floor
> > > > > We already flag locks that don't have a matching unlock.
> > > > 
> > > > Of course, but as you know this is completely orthogonal.
> > > 
> > > Yeah, okay.  It doesn't hurt to add this check, but the check isn't 
> > > complete.  For example, it won't catch the invalid usage here:
> > > 
> > > P0(srcu_struct *ss)
> > > {
> > > 	int r1, r2;
> > > 
> > > 	r1 = srcu_read_lock(ss);
> > > 	srcu_read_unlock(&ss, r1);
> > > 	r2 = srcu_read_lock(ss);
> > > 	srcu_read_unlock(&ss, r2);
> > > }
> > > 
> > > exists (~0:r1=0:r2)
> > > 
> > > On the other hand, how often will people make this sort of mistake in 
> > > their litmus tests?  My guess is not very.
> > 
> > I must be blind this morning.  I see a well-formed pair of back-to-back
> > SRCU read-side critical sections.  A rather useless pair, given that
> > both are empty,
> 
> And there are no synchronize_srcu() calls.

Agreed, an additional level of uselessness, though not invalidity.  After
all, the more advantageous SRCU use cases execute lots of srcu_read_lock()
and srcu_read_unlock() calls and very few synchronize_srcu() calls.

> >  but valid nonetheless.
> > 
> > Or is the bug the use of 0:r1 and 0:r2 in the "exists" clause?  If so,
> > then I agree that this is not at all a high-priority bug to flag.
> 
> Yes, that is the bug.  The patched version of LKMM and the 
> implementation you described say the exist clause will never be 
> satisfied, the current version of LKMM says it will always be 
> satisfied, and the theoretical model for SRCU says it will sometimes 
> be satisfied -- which is the answer we want.

Got it, thank you.

> > > > Can you briefly explain how the operational model you have in mind for
> > > > srcu's up and down allows x==1 (and y==0 and idx1==idx2) in the example I
> > > > sent before (copied with minor edit below for convenience)?
> > > > 
> > > > P0{
> > > >     idx1 = srcu_down(&ss);
> > > >     store_rel(p1, true);
> > > > 
> > > > 
> > > >     shared cs
> > > > 
> > > >     R x == 1
> > > > 
> > > >     while (! load_acq(p2));
> > > >     R idx2 == idx1 // for some reason, we got lucky!
> > > >     srcu_up(&ss,idx1);
> > > > }
> > > > 
> > > > P1{
> > > >     idx2 = srcu_down(&ss);
> > > >     store_rel(p2, true);
> > > > 
> > > >     shared cs
> > > > 
> > > >     R y == 0
> > > > 
> > > >     while (! load_acq(p1));
> > > >     srcu_up(&ss,idx2);
> > > > }
> > > > 
> > > > P2 {
> > > >     W y = 1
> > > >     srcu_sync(&ss);
> > > >     W x = 1
> > > > }
> > > > 
> > > > 
> > > > I can imagine models that allow this but they aren't pretty. Maybe you have
> > > > a better operational model?
> > > 
> > > The operational model is not very detailed as far as SRCU is concerned.  
> > > It merely says that synchronize_srcu() executing on CPU C waits until:
> > > 
> > > 	All writes received by C prior to the start of the function have 
> > > 	propagated to all CPUs (call this time t1).  This could be 
> > > 	arranged by having synchronize_srcu() start with an smp_mb().
> > > 
> > > 	For every srcu_down_read() that executed prior to t1, the 
> > > 	matching srcu_up_read() has finished and all writes received 
> > > 	by the unlocking CPU prior to the unlock have propagated to all 
> > > 	CPUs.  This could be arranged by having the srcu_up_read() 
> > > 	call include a release write which has been received by C and 
> > > 	having synchronize_srcu() end with an smp_mb().
> > 
> > Agreed.  It took me a few reads to see that this prohibited later writes
> > by other CPUs affecting reads in the prior critical section, but the "all
> > writes received by the unlocking CPU" does seem to me to prohibit this.
> > 
> > > The operational model doesn't specify exactly how synchronize_srcu() 
> > > manages to do these things, though.
> > 
> > Which is a good thing, given the wide variety of possible implementations.
> > 
> > > Oh yes, it also says that the value returned by srcu_down_read() is an 
> > > unpredictable int.  This differs from the code in the patched herd 
> > > model, which says that the value will always be 0.
> > 
> > As noted earlier, I believe that this is fine.  If significant problems
> > arise, then we might need to do something.  However, there is some
> > cost to complexity, so we should avoid getting too speculative about
> > possible probems.
> > 
> > > Anyway, the operational model says the litmus test can succeed as 
> > > follows:
> > > 
> > > P0                    P1                     P2
> > > --------------------- ---------------------- -------------------------
> > >                       Widx2=srcu_down_read()
> > >                       Wrel p2=1
> > >                       Ry=0
> > >                                              Wy=1
> > >                                              synchronize_srcu() starts
> > > 	... idx2, p2, and y propagate to all CPUs ...
> > >                                              Time t1
> > > Widx1=srcu_down_read()
> > > Wrel p1=1
> > > 	,,, idx1 and p1 propagate to all CPUs ...
> > >                       Racq p1=1
> > >                       srcu_up_read(idx2)
> > >                                              synchronize_srcu() ends
> > >                                              Wx=1
> > > Rx=1
> > > Racq p2=1
> > > Ridx2=idx1
> > > srcu_up_read(idx1)
> > > 
> > > (The final equality in P0 is allowed because idx1 and idx2 are both 
> > > random numbers, so they might be equal.)
> > 
> > This all makes sense to me.
> > 
> > > Incidentally, it's worth pointing out that the algorithm Paul described 
> > > will forbid this litmus test even if you remove the while loop and the 
> > > read of idx2 from P0.
> > 
> > Given that the values returned by those two srcu_down_read() calls must
> > be the same, then, yes, the current Linux-kernel Tree RCU implementation
> > would forbid this.
> > 
> > On the other hand, if the two indexes differ, then P2's synchronize_srcu()
> > can see that there are no really old readers on !Widx2, then flip
> > the index.  This would mean that P0's Widx1 would be equal to !Widx2,
> > which has already been waited on.  Then P2's synchronize_srcu() can
> > return as soon as it sees P1's srcu_up_read().
> 
> Sorry, what I said may not have been clear.  I meant that even if you 
> remove the while loop and read of idx2 from P0, your algorithm will 
> still not allow idx1 = idx2 provided everything else is as written.

If synchronize_srcu() has flipped ->srcu_idx by the time that P0's
srcu_down_read() executes, agreed.  Otherwise, Widx1 and Widx2 might
well be equal.

> > > 	If you don't pass the value to exactly one srcu_up_read() call, 
> > > 	you void the SRCU warranty.  In addition, if you do anything 
> > > 	else with the value that might affect the outcome of the litmus 
> > > 	test, you incur the risk that herd7 might compute an incorrect 
> > > 	result [as in the litmus test I gave near the start of this
> > > 	email].
> > > 
> > > Merely storing the value in a shared variable which then doesn't get 
> > > used or is used only for something inconsequential would not cause any 
> > > problems.
> > 
> > That is consistent with my understanding, but please let me try again
> > in list form:
> 
> ...
> 
> > 4.	If a value returned from a given srcu_read_lock() is passed to
> > 	exactly one srcu_read_unlock(), and then that value is later
> > 	manipulated, that is bad practice (exactly what are you trying
> > 	to accomplish by so doing?), but SRCU won't know the difference.
> > 
> > 	In particular, the Linux-kernel SRCU implementation doesn't know
> > 	about the herd7 "exists" clause, but kudos to Jonas for casting
> > 	his conceptual net widely indeed!
> 
> In addition, herd7 might give an answer different from what would 
> actually happen in the kernel, depending on what the manipulation does.

True, given that the kernel's srcu_read_unlock() can return a
non-zero value.

> Yes, that is more or less what I was trying to express.

Sounds good!

							Thanx, Paul