[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y8lynRI35cFeuqb5@rowland.harvard.edu>
Date: Thu, 19 Jan 2023 11:41:01 -0500
From: Alan Stern <stern@...land.harvard.edu>
To: Jonas Oberhauser <jonas.oberhauser@...wei.com>
Cc: "paulmck@...nel.org" <paulmck@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
"parri.andrea" <parri.andrea@...il.com>, will <will@...nel.org>,
"boqun.feng" <boqun.feng@...il.com>, npiggin <npiggin@...il.com>,
dhowells <dhowells@...hat.com>,
"j.alglave" <j.alglave@....ac.uk>,
"luc.maranget" <luc.maranget@...ia.fr>, akiyks <akiyks@...il.com>,
dlustig <dlustig@...dia.com>, joel <joel@...lfernandes.org>,
urezki <urezki@...il.com>,
quic_neeraju <quic_neeraju@...cinc.com>,
frederic <frederic@...nel.org>,
Kernel development list <linux-kernel@...r.kernel.org>
Subject: Re: Internal vs. external barriers (was: Re: Interesting LKMM litmus
test)
On Thu, Jan 19, 2023 at 12:22:50PM +0100, Jonas Oberhauser wrote:
>
>
> On 1/19/2023 3:28 AM, Alan Stern wrote:
> > > This is a permanent error; I've given up. Sorry it didn't
> > work out.
>
> [It seems the e-mail still reached me through the mailing list]
[For everyone else, Jonas is referring to the fact that the last two
emails I sent to his huaweicloud.com address could not be delivered, so
I copied them off-list to his huawei.com address.]
> > > I consider that a hack though and don't like it.
> > It _is_ a bit of a hack, but not a huge one. srcu_read_lock() really
> > is a lot like a load, in that it returns a value obtained by reading
> > something from memory (along with some other operations, though, so it
> > isn't a simple straightforward read -- perhaps more like an
> > atomic_inc_return_relaxed).
> The issue I have with this is that it might create accidental ordering. How
> does it behave when you throw fences in the mix?
I think this isn't going to be a problem. Certainly any real
implementation of scru_read_lock() is going to involve some actual load
operations, so any unintentional ordering caused by fences will also
apply to real executions. Likewise for srcu_read_unlock and store
operations.
> It really does not work like an increment at all, I think srcu_read_lock()
> only reads the currently active index, but the index is changed by
> srcu_sync. But even that is an implementation detail of sorts. I think the
> best way to think of it would be for srcu_read_lock to just return an
> arbitrary value.
I think I'll stick to it always returning the initial value. Paul said
that would be okay.
> The user can not rely on any kind of "accidental" rfe edges between these
> events for ordering.
>
> Perhaps if you flag any use of these values in address or control
> dependencies, as well as any event which depends on more than one of these
> values, you could prove that it's impossible to contrain the behavior
> through these rfe(and/or co) edges because you can anyways never inspect the
> value returned by the operation (except to pass it into srcu_unlock).
>
> Or you might be able to explicitly eliminate the events everywhere, just
> like you have done for carry-dep in your patch.
On second thought, I'll make it impossible to read from the
srcu_read_unlock events by removing them from the rf (and rfi/rfe)
relation. Then it won't be necessary to change carry-dep or anything
else.
> But it looks so brittle.
Maybe not so much after this change?
> > srcu_read_unlock() is somewhat less like a store, but it does have one
> > notable similarity: It takes an input value and therefore can be the
> > target of a data dependency. The biggest difference is that an
> > srcu_read_unlock() can't really be read-from. It would be nice if herd
> > had an event type that behaved this way.
> Or if you could declare your own : )
> Obviously, you will have accidental rf edges going from srcu_read_unlock()
> to srcu_read_lock() if you model them this way.
Not when those edges are erased.
> > Also, herd doesn't have any way of saying that the value passed to a
> > store is an unmodified copy of the value obtained by a load. In our
> > case that doesn't matter much -- nobody should be writing litmus tests
> > in which the value returned by srcu_read_lock() is incremented and then
> > decremented again before being passed to srcu_write_lock()!
>
> It would be nice if herd allowed declaring structs that can be used for such
> purposes.
> (anyways, I am not sure if Luc is still following everything in this deeply
> nested thread that started somewhere completely different. But maybe if Paul
> or you open a feature request, let me know so that I can give my 2ct)
I thought you were against adding into herd features that were specific
to the Linux kernel?
> > > Note that if your ordering relies on actually using gp twice in a row, then
> > > these must come from strong-fence, but you should be able to just take the
> > > shortcut by merging them into a single gp.
> > > po;rcu-gp;po;rcu-gp;po <= gp <= strong-fence <= hb & strong-order
> > I don't know what you mean by this. The example above does rely on
> > having two synchronize_rcu() calls; with only one it would be allowed.
>
> I mean that if you have a cycle that is formed by having two adjacent actual
> `gp` edges, like .... ; gp;gp ; .... with gp= po ; rcu-gp ; po?,
> (not like your example, where the cycle uses two *rcu*-gp but no gp edges)
Don't forget that I had in mind a version of the model where rcu-gp did
not exist.
> and assume we define gp' = po ; rcu-gp ; po and hb' and pb' to use gp'
> instead of gp,
> then there are two cases for how that cycle came to be, either 1) as
> ... ; hb;hb ; ....
> but then you can refactor as
> ... ; po;rcu-gp;po;rcu-gp;po ; ...
> ... ; po;rcu-gp; po ; ...
> ... ; gp' ; ...
> ... ; hb' ; ...
> which again creates a cycle, or 2) as
> ... ; pb ; hb ; ...
> coming from
> ... ; prop ; gp ; gp ; ....
> which you can similarly refactor as
> ... ; prop ; po;rcu-gp;po ; ....
> ... ; prop ; gp' ; ....
> and again get a cycle with
> ... ; pb' ; ....
> Therefore, gp = po;rcu-gp;po should be equivalent.
The point is that in P1, we have Write ->(gp;gp) Read, but we do not
have Write ->(gp';gp') Read. Only Write ->gp' Read. So if you're using
gp' instead of gp, you'll analyze the litmus test as if it had only one
grace period but two critical sections, getting a wrong answer.
Here's a totally different way of thinking about these things, which may
prove enlightening. These thoughts originally occurred to me years ago,
and I had forgotten about them until last night.
If G is a grace period, let's write t1(G) for the time when G starts and
t2(G) for the time when G ends.
Likewise, if C is a read-side critical section, let's write t2(C) for
the time when C starts (or the lock executes if you prefer) and t1(C)
for the time when C ends (or the unlock executes). This terminology
reflects the "backward" role that critical sections play in the memory
model.
Now we can can characterize rcu-order and rcu-link in operational terms.
Let A and B each be either a grace period or a read-side critical
section. Then:
A ->rcu-order B means t1(A) < t2(B), and
A ->rcu-link B means t2(A) <= t1(B).
(Of course, we always have t1(X) < t2(X) for any grace period or
critical section X.)
This explains quite a lot. For example, we can justify including
C ->rcu-link G
into rcu-order as follows. From C ->rcu-link G we get that t2(C) <=
t1(G), in other words, C starts when or before G starts. Then the
Fundamental Law of RCU says that C must end before G ends, since
otherwise C would span all of G. Thus t1(C) < t2(G), which is C
->rcu-order G.
The case of G ->rcu-link C is similar.
This also explains why rcu-link can be extended by appending (rcu-order
; rcu-link)*. From X ->rcu-order Y ->rcu-link Z we get that t1(X) <
t2(Y) <= t1(Z) and thus t1(X) <= t1(Z). So if
A ->rcu-link B ->(rcu-order ; rcu-link)* C
then t2(A) <= t1(B) <= t1(C), which justifies A ->rcu-link C.
The same sort of argument shows that rcu-order should be extendable by
appending (rcu-link ; rcu-order)* -- but not (rcu-order ; rcu-link)*.
This also justifies why a lone gp belongs in rcu-order: G ->rcu-order G
holds because t1(G) < t2(G). But for critical sections we have t2(C) <
t1(C) and so C ->rcu-order C does not hold.
Assuming ordinary memory accesses occur in a single instant, you see why
it makes sense to consider (po ; rcu-order ; po) an ordering. But when
you're comparing grace periods or critical sections to each other,
things get a little ambiguous. Should G1 be considered to come before
G2 when t1(G1) < t1(G2), when t2(G1) < t2(G2), or when t2(G1) < t1(G2)?
Springing for (po ; rcu-order ; po?) amounts to choosing the second
alternative.
Alan
Powered by blists - more mailing lists