[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230118200601.GH2948950@paulmck-ThinkPad-P17-Gen-1>
Date: Wed, 18 Jan 2023 12:06:01 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Alan Stern <stern@...land.harvard.edu>
Cc: Andrea Parri <parri.andrea@...il.com>,
Jonas Oberhauser <jonas.oberhauser@...wei.com>,
Peter Zijlstra <peterz@...radead.org>, will <will@...nel.org>,
"boqun.feng" <boqun.feng@...il.com>, npiggin <npiggin@...il.com>,
dhowells <dhowells@...hat.com>,
"j.alglave" <j.alglave@....ac.uk>,
"luc.maranget" <luc.maranget@...ia.fr>, akiyks <akiyks@...il.com>,
dlustig <dlustig@...dia.com>, joel <joel@...lfernandes.org>,
urezki <urezki@...il.com>,
quic_neeraju <quic_neeraju@...cinc.com>,
frederic <frederic@...nel.org>,
Kernel development list <linux-kernel@...r.kernel.org>
Subject: Re: Internal vs. external barriers (was: Re: Interesting LKMM litmus
test)
On Wed, Jan 18, 2023 at 11:50:24AM -0500, Alan Stern wrote:
> On Tue, Jan 17, 2023 at 07:50:41PM -0800, Paul E. McKenney wrote:
> > On Tue, Jan 17, 2023 at 03:15:06PM -0500, Alan Stern wrote:
> > > On Tue, Jan 17, 2023 at 09:43:08AM -0800, Paul E. McKenney wrote:
> > > > On Tue, Jan 17, 2023 at 10:56:34AM -0500, Alan Stern wrote:
> > > > > Isn't it true that the current code will flag srcu-bad-nesting if a
> > > > > litmus test has non-nested overlapping SRCU read-side critical sections?
> > > >
> > > > Now that you mention it, it does indeed, flagging srcu-bad-nesting.
> > > >
> > > > Just to see if I understand, different-values yields true if the set
> > > > contains multiple elements with the same value mapping to different
> > > > values. Or, to put it another way, if the relation does not correspond
> > > > to a function.
> > >
> > > As I understand it, given a relation r (i.e., a set of pairs of events),
> > > different-values(r) returns the sub-relation consisting of those pairs
> > > in r for which the value associated with the first event of the pair is
> > > different from the value associated with the second event of the pair.
> >
> > OK, so different-values(r) is different than (r \ id) because the
> > former operates on values and the latter on events?
>
> No. Both of these things are relations, not values or events.
>
> Suppose you had:
>
> A: WRITE_ONCE(x, 1);
> B: WRITE_ONCE(y, 1);
> C: WRITE_ONCE(z, 2);
>
> Then the po relation would consist of the pairs (A,B), (A,C), and (B,C).
>
> The different-values(po) relation would include only (A,C) and (B,C).
> It would not include (A,B) because the two events in that pair have the
> same value: 1.
>
> And finally, (po \ id) would be the same as po, because the id relation
> consists of the pairs (A,A), (B,B), and (C,C) -- and none of those are
> in po to begin with, so removing them from po doesn't do anything.
Thank you for the much-needed tutorial!
The different values are in the domain, not the range, then. Good.
> > > Right now the behavior is kind of strange. The following simple litmus
> > > test:
> > >
> > > C test
> > > {}
> > > P0(int *x)
> > > {
> > > int r1;
> > > r1 = srcu_read_lock(x);
> > > srcu_read_unlock(x, r1);
> > > }
> > > exists (~0:r1=0)
> > >
> > > produces the following output from herd7:
> > >
> > > Test test Allowed
> > > States 1
> > > 0:r1=906;
> > > Ok
> > > Witnesses
> > > Positive: 1 Negative: 0
> > > Condition exists (not (0:r1=0))
> > > Observation test Always 1 0
> > > Time test 0.01
> > > Hash=2f42c87ae9c1d267f4e80c66f646b9bb
> > >
> > > Don't ask me where that 906 value comes from or why it is't 0. Also,
> > > herd7's graphical output shows there is no data dependency from the lock
> > > to the unlock, but we need to have one.
> >
> > Is it still the case that any herd7 value greater than 127 is special?
>
> I have no idea.
Boqun mentioned off-list this morning that this is still the case,
and that each execution of srcu_read_lock() will return a unique value.
Assuming that I understood him correctly, anyway.
> > > > Given an Srcu-down and an Srcu-up:
> > > >
> > > > let srcu-rscs = ( return_value(Srcu-lock) ; (dep | rfi)* ;
> > > > parameter(Srcu-unlock, 2) ) |
> > > > ( return_value(Srcu-down) ; (dep | rf)* ;
> > > > parameter(Srcu-up, 2) )
> > > >
> > > > Seem reasonable, or am I missing yet something else?
> > >
> > > Not at all reasonable.
> > >
> > > For one thing, consider this question: Which statements lie inside a
> > > read-side critical section?
> >
> > Here srcu_down_read() and srcu_up_read() are to srcu_read_lock() and
> > srcu_read_unlock() as down_read() and up_read() are to mutex_lock()
> > and mutex_unlock(). Not that this should be all that much comfort
> > given that I have no idea how one would go about modeling down_read()
> > and up_read() in LKMM.
>
> It might make sense to work on that first, before trying to do
> srcu_down_read() and srcu_up_read().
The thing is that it is easy to associate an srcu_down_read() with the
corresponding srcu_up_read(). With down() and up(), although in the
Linux kernel this might be represented by a data structure tracking
(say) an I/O request, LKMM is going to be hard pressed to figure that out.
If I am not too confused, the bell code would look something like this
(NOT FOR MAINLINE!):
------------------------------------------------------------------------
(* Compute matching pairs of nested Srcu-lock and Srcu-unlock *)
let srcu-rscs = ([Srcu-lock] ; (data | rf)* ; [Srcu-unlock]) & loc
(* Validate nesting *)
empty Srcu-lock \ domain(srcu-rscs) as mismatched-srcu-locking
empty Srcu-unlock \ range(srcu-rscs) as mismatched-srcu-unlocking
flag ~empty (srcu-rscs^-1 ; srcu-rscs) \ id as multiple-srcu-unlocks
(* Check for use of synchronize_srcu() inside an RCU critical section *)
flag ~empty rcu-rscs & (po ; [Sync-srcu] ; po) as invalid-sleep
(* Validate SRCU dynamic match *)
flag ~empty different-values(srcu-rscs) as srcu-bad-nesting
------------------------------------------------------------------------
A for-mainline version would use Srcu-down and Srcu-up rather than
hijacking the current Srcu-lock and Srcu-unlock. Which seems to require
herd7 changes, but not unless/until we have agreement that this is a
reasonable thing to do.
> > > With srcu_read_lock() and a matching srcu_read_unlock(), the answer is
> > > clear: All statements po-between the two. With srcu_down_read() and
> > > srcu_up_read(), the answer is cloudy in the extreme.
> >
> > And I agree that it must be clearly specified, and my that previous try
> > was completely lacking. Here is a second attempt:
> >
> > let srcu-rscs = (([Srcu-lock] ; data ; [Srcu-unlock]) & loc) |
> > (([Srcu-down] ; (data | rf)* ; [Srcu-up]) & loc)
> >
> > (And I see your proposal and will try it.)
> >
> > > Also, bear in mind that the Fundamental Law of RCU is formulated in
> > > terms of stores propagating to a critical section's CPU. What are we to
> > > make of this when a single critical section can belong to more than one
> > > CPU?
> >
> > One way of answering this question is by analogy with down() and up()
> > when used as a cross-task mutex. Another is by mechanically applying
> > some of current LKMM. Let's start with this second option.
> >
> > LKMM works mostly with critical sections, but we also discussed ordering
> > based on the set of events po-after an srcu_read_lock() on the one hand
> > and the set of events po-before an srcu_read_unlock() on the other.
> > Starting here, the critical section is the intersection of these two sets.
> >
> > In the case of srcu_down_read() and srcu_up_read(), as you say, whatever
> > might be a critical section must span processes. So what if instead of
> > po, we used (say) xbstar? Then given a set of A such that ([Srcu-down ;
> > xbstar ; A) and B such that (B ; xbstar ; [Srcu-up]), then the critical
> > section is the intersection of A and B.
> >
> > One objection to this approach is that a bunch of unrelated events could
> > end up being defined as part of the critical section. Except that this
> > happens already anyway in real critical sections in the Linux kernel.
> >
> > So what about down() and up() when used as cross-task mutexes?
> > These often do have conceptual critical sections that protect some
> > combination of resource, but these critical sections might span tasks
> > and/or workqueue handlers. And any reasonable definition of these
> > critical sections would be just as likely to pull in unrelated accesses as
> > the above intersection approach for srcu_down_read() and srcu_up_read().
> >
> > But I am just now making all this up, so thoughts?
>
> Maybe we don't really need to talk about read-side critical sections at
> all. Once again, here's what explanation.txt currently says:
>
> For any critical section C and any grace period G, at least
> one of the following statements must hold:
>
> (1) C ends before G does, and in addition, every store that
> propagates to C's CPU before the end of C must propagate to
> every CPU before G ends.
>
> (2) G starts before C does, and in addition, every store that
> propagates to G's CPU before the start of G must propagate
> to every CPU before C starts.
>
> Suppose we change this to:
>
> For any RCU lock operation L and matching unlock operation U,
> and any matching grace period G, at least one of the following
> statements must hold:
>
> (1) U executes before G ends, and in addition, every store that
> propagates to U's CPU before U executes must propagate to
> every CPU before G ends.
>
> (2) G starts before L executes, and in addition, every store that
> propagates to G's CPU before the start of G must propagate
> to every CPU before L executes.
>
> (For SRCU, G matches L and U if it operates on the same srcu structure.)
>
> This can be applied sensibly to regular RCU, regular SRCU, and the
> up/down version of SRCU. Maybe it's what we want.
I do like your proposed change!
> > > Indeed, given:
> > >
> > > P0(int *x) {
> > > srcu_down_read(x);
> > > }
> > >
> > > P1(int *x) {
> > > srcu_up_read(x);
> > > }
> > >
> > > what are we to make of executions in which P1 executes before P0?
> >
> > Indeed, there had better be something else forbidding such executions, or
> > this is an invalid use of srcu_down_read() and srcu_up_read(). This might
> > become more clear if the example is expanded to include the index returned
> > from srcu_down_read() that is to be passed to srcu_up_read():
> >
> > P0(int *x, int *i) {
> > WRITE_ONCE(i, srcu_down_read(x));
> > }
> >
> > P1(int *x, int *i) {
> > srcu_up_read(x, READ_ONCE(i));
> > }
>
> Hmmm. What happens if you write:
>
> r1 = srcu_down_read(x);
> r2 = srcu_down_read(x);
> srcu_up_read(x, r1);
> srcu_up_read(x, r2);
>
> ? I can't even tell what that would be _intended_ to do.
Let's take it one line at a time:
r1 = srcu_down_read(x);
// A
r2 = srcu_down_read(x);
// B
srcu_up_read(x, r1);
// C
srcu_up_read(x, r2);
// D
An SRCU grace period that starts at A is permitted to complete at
C, difficult though it might be to actually make this happen in the
Linux kernel. It need wait only for pre-existing critical sections.
But an SRCU grace period that starts at either B or C must wait for both
critical sections, that is until D.
This applies to srcu_read_lock() and srcu_read_unlock() just as much as
to srcu_down_read() and srcu_up_read(), correct? Each SRCU read-side
critical section is its own thing, and they do not flatten the way that
RCU read-side critical sections do.
I don't know of a safe and sane use of this pattern, as noted here:
https://paulmck.livejournal.com/40593.html
But someone might come up with such a use.
> In fact, it seems likely that to make this work, you have to store at
> least two values in *x: the value of the up/down counter, and the value
> returned by srcu_down_read or stored by srcu_up_read. That means you
> can't describe what's happening without using a structure, and herd7
> doesn't support structures.
Yes, if we needed to combine the two overlapping grace periods into a
single larger grace period, this would be a problem. But we do not,
because an SRCU grace period beginning just after the WRITE_ONCE(*x, 1)
is allowed to end right after the srcu_up_read(s, r1). That grace period
is not required to wait for the end of the second critical section.
Thanx, Paul
Powered by blists - more mailing lists