[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHWkzRRHMjji3oZx5C8S57Z+Xf9NSoK4AyeXx=NMU8JSB3-=6g@mail.gmail.com>
Date: Tue, 4 Mar 2014 22:11:37 +0000
From: Peter Sewell <Peter.Sewell@...cam.ac.uk>
To: Torvald Riegel <triegel@...hat.com>
Cc: Paul McKenney <paulmck@...ux.vnet.ibm.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Will Deacon <will.deacon@....com>,
Peter Zijlstra <peterz@...radead.org>,
Ramana Radhakrishnan <Ramana.Radhakrishnan@....com>,
David Howells <dhowells@...hat.com>,
"linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
"mingo@...nel.org" <mingo@...nel.org>,
"gcc@....gnu.org" <gcc@....gnu.org>
Subject: Re: [RFC][PATCH 0/5] arch: atomic rework
On 3 March 2014 20:44, Torvald Riegel <triegel@...hat.com> wrote:
> On Sun, 2014-03-02 at 04:05 -0600, Peter Sewell wrote:
>> On 1 March 2014 08:03, Paul E. McKenney <paulmck@...ux.vnet.ibm.com> wrote:
>> > On Sat, Mar 01, 2014 at 04:06:34AM -0600, Peter Sewell wrote:
>> >> Hi Paul,
>> >>
>> >> On 28 February 2014 18:50, Paul E. McKenney <paulmck@...ux.vnet.ibm.com> wrote:
>> >> > On Thu, Feb 27, 2014 at 12:53:12PM -0800, Paul E. McKenney wrote:
>> >> >> On Thu, Feb 27, 2014 at 11:47:08AM -0800, Linus Torvalds wrote:
>> >> >> > On Thu, Feb 27, 2014 at 11:06 AM, Paul E. McKenney
>> >> >> > <paulmck@...ux.vnet.ibm.com> wrote:
>> >> >> > >
>> >> >> > > 3. The comparison was against another RCU-protected pointer,
>> >> >> > > where that other pointer was properly fetched using one
>> >> >> > > of the RCU primitives. Here it doesn't matter which pointer
>> >> >> > > you use. At least as long as the rcu_assign_pointer() for
>> >> >> > > that other pointer happened after the last update to the
>> >> >> > > pointed-to structure.
>> >> >> > >
>> >> >> > > I am a bit nervous about #3. Any thoughts on it?
>> >> >> >
>> >> >> > I think that it might be worth pointing out as an example, and saying
>> >> >> > that code like
>> >> >> >
>> >> >> > p = atomic_read(consume);
>> >> >> > X;
>> >> >> > q = atomic_read(consume);
>> >> >> > Y;
>> >> >> > if (p == q)
>> >> >> > data = p->val;
>> >> >> >
>> >> >> > then the access of "p->val" is constrained to be data-dependent on
>> >> >> > *either* p or q, but you can't really tell which, since the compiler
>> >> >> > can decide that the values are interchangeable.
>> >> >> >
>> >> >> > I cannot for the life of me come up with a situation where this would
>> >> >> > matter, though. If "X" contains a fence, then that fence will be a
>> >> >> > stronger ordering than anything the consume through "p" would
>> >> >> > guarantee anyway. And if "X" does *not* contain a fence, then the
>> >> >> > atomic reads of p and q are unordered *anyway*, so then whether the
>> >> >> > ordering to the access through "p" is through p or q is kind of
>> >> >> > irrelevant. No?
>> >> >>
>> >> >> I can make a contrived litmus test for it, but you are right, the only
>> >> >> time you can see it happen is when X has no barriers, in which case
>> >> >> you don't have any ordering anyway -- both the compiler and the CPU can
>> >> >> reorder the loads into p and q, and the read from p->val can, as you say,
>> >> >> come from either pointer.
>> >> >>
>> >> >> For whatever it is worth, hear is the litmus test:
>> >> >>
>> >> >> T1: p = kmalloc(...);
>> >> >> if (p == NULL)
>> >> >> deal_with_it();
>> >> >> p->a = 42; /* Each field in its own cache line. */
>> >> >> p->b = 43;
>> >> >> p->c = 44;
>> >> >> atomic_store_explicit(&gp1, p, memory_order_release);
>> >> >> p->b = 143;
>> >> >> p->c = 144;
>> >> >> atomic_store_explicit(&gp2, p, memory_order_release);
>> >> >>
>> >> >> T2: p = atomic_load_explicit(&gp2, memory_order_consume);
>> >> >> r1 = p->b; /* Guaranteed to get 143. */
>> >> >> q = atomic_load_explicit(&gp1, memory_order_consume);
>> >> >> if (p == q) {
>> >> >> /* The compiler decides that q->c is same as p->c. */
>> >> >> r2 = p->c; /* Could get 44 on weakly order system. */
>> >> >> }
>> >> >>
>> >> >> The loads from gp1 and gp2 are, as you say, unordered, so you get what
>> >> >> you get.
>> >> >>
>> >> >> And publishing a structure via one RCU-protected pointer, updating it,
>> >> >> then publishing it via another pointer seems to me to be asking for
>> >> >> trouble anyway. If you really want to do something like that and still
>> >> >> see consistency across all the fields in the structure, please put a lock
>> >> >> in the structure and use it to guard updates and accesses to those fields.
>> >> >
>> >> > And here is a patch documenting the restrictions for the current Linux
>> >> > kernel. The rules change a bit due to rcu_dereference() acting a bit
>> >> > differently than atomic_load_explicit(&p, memory_order_consume).
>> >> >
>> >> > Thoughts?
>> >>
>> >> That might serve as informal documentation for linux kernel
>> >> programmers about the bounds on the optimisations that you expect
>> >> compilers to do for common-case RCU code - and I guess that's what you
>> >> intend it to be for. But I don't see how one can make it precise
>> >> enough to serve as a language definition, so that compiler people
>> >> could confidently say "yes, we respect that", which I guess is what
>> >> you really need. As a useful criterion, we should aim for something
>> >> precise enough that in a verified-compiler context you can
>> >> mathematically prove that the compiler will satisfy it (even though
>> >> that won't happen anytime soon for GCC), and that analysis tool
>> >> authors can actually know what they're working with. All this stuff
>> >> about "you should avoid cancellation", and "avoid masking with just a
>> >> small number of bits" is just too vague.
>> >
>> > Understood, and yes, this is intended to document current compiler
>> > behavior for the Linux kernel community. It would not make sense to show
>> > it to the C11 or C++11 communities, except perhaps as an informational
>> > piece on current practice.
>> >
>> >> The basic problem is that the compiler may be doing sophisticated
>> >> reasoning with a bunch of non-local knowledge that it's deduced from
>> >> the code, neither of which are well-understood, and here we have to
>> >> identify some envelope, expressive enough for RCU idioms, in which
>> >> that reasoning doesn't allow data/address dependencies to be removed
>> >> (and hence the hardware guarantee about them will be maintained at the
>> >> source level).
>> >>
>> >> The C11 syntactic notion of dependency, whatever its faults, was at
>> >> least precise, could be reasoned about locally (just looking at the
>> >> syntactic code in question), and did do that. The fact that current
>> >> compilers do optimisations that remove dependencies and will likely
>> >> have many bugs at present is besides the point - this was surely
>> >> intended as a *new* constraint on what they are allowed to do. The
>> >> interesting question is really whether the compiler writers think that
>> >> they *could* implement it in a reasonable way - I'd like to hear
>> >> Torvald and his colleagues' opinion on that.
>> >>
>> >> What you're doing above seems to be basically a very cut-down version
>> >> of that, but with a fuzzy boundary. If you want it to be precise,
>> >> maybe it needs to be much simpler (which might force you into ruling
>> >> out some current code idioms).
>> >
>> > I hope that Torvald Riegel's proposal (https://lkml.org/lkml/2014/2/27/806)
>> > can be developed to serve this purpose.
>>
>> (I missed that mail when it first came past, sorry)
>>
>> That's also going to be tricky, I'm afraid. The key condition there is:
>>
>> "* at the time of execution of E, L [PS: I assume that L is a
>> typo and should be E]
>
> No, L was intended. (But see below.)
then I misunderstood...
>> can possibly have returned at
>> least two different values under the assumption that L itself
>> could have returned any value allowed by L's type."
>>
>> First, the evaluation of E might be nondeterministic - e.g., for an
>> artificial example, if it's just a nondeterministic value obtained
>> from the result of a race on SC atomics. The above doesn't
>> distinguish between that (which doesn't have a real dependency on L)
>> and that XOR'd with L (which does). And it does so in the wrong
>> direction: it'll say there the former has a dependency on L.
>
> I'm not quite sure I understand the examples you want to point out
> (could you add brief code snippets, perhaps?) -- but the informal
> definition I proposed also says that E must have used L in some way to
> make it's computation. That's pretty vague, and the way I phrased the
> requirement is probably not optimal.
...so this example isn't so relevant, but it's maybe interesting
anyway. In free-wheeling psuedocode:
x = read_consume(L) // a memory_order_consume read of a 1-bit value from L
y = nondet() // some code that
nondeterministically produces another 1-bit value, eg by spawning two
threads that each do an SC-atomic write to some other location,
returning 0 or 1 depending on which wins
then evaluate either just
z=y
or
z = x XOR y
In my misinterpretation of what you wrote, your definition would say
there's a dependency from the load of L to the evalution of y, even
though there isn't.
> So let me expand a bit on the background first. What I tried to capture
> with the rules is that an evaluation (ie, E), really uses the value
> returned by L, and not just a, for example, constant value that can be
> inferred from whatever was executed between L and E. This (is intended
> to) prevent value prediction and such by programmers. The compiler in
> turn knows what a real program is allowed to do that still wants to rely
> on the ordering by the consume. Basing all of this on the values seemed
> to be helpful because basing it on *purely* syntax didn't seem
> implementable; for the latter, we'd need to disallow cases such as x-x
> or require compilers to include artificial dependencies if a program
> indeed does value speculation. IOW, it didn't seem possible to draw a
> clear distinction between a data dependency defined purely based on
> syntax and control dependencies.
>
> Another way to perhaps phrase the requirement might be to construct it
> inductively, so that "E uses the value returned by L" becomes clearer.
> However, I currently don't really know how to do that without any holes.
>
> For example, let's say that an atomic mo_consume load has a value
> dependency to itself; the dependency has an associated set of values,
> which is equal to all values allowed by the type (but see also below).
> Then, an evaluation A that uses B as operand has a value dependency on B
> if the associated set of values can still have more than one element.
> Whether that's the case depends on the operation, and the other
> operands, including their values at the time of execution.
> However, it's not just results of evaluations that effectively constrain
> the set of values. Conditional execution does as well, because the
> condition being true might establish a constraint on L, which might
> remove any dependency when L (or a result of a computation involving L)
> is used. So I'm not sure the inductive approach would work.
>
> You said you thought I might have wanted to say that:
> "E really-depends on L if there exist two different values that (just
> according to typing) might be read for L that give rise to two
> different values for E".
>
> Which should follow from the above, or at least should not conflict with
> it. My "definition" tried to capture that the program must not
> establish so many constraints between E and L that the value of L is
> clear in the sense of having exactly one value. If we dereference such
> an E whose execution in fact constrains L to one value, there is no real
> dependency anymore.
> However, by itself, this doesn't cover that E must use L in it's
> computation -- which your formulation does, I think.
unfortunately not - that's what the example above shows.
> Another way might be to try to define which constraints established by
> an execution (based on the executed code's semantics) actually remove
> value dependencies on L.
>
> Do you have any suggestions for how to define this (ie, true value
> dependencies) in a better way?
not at the moment. I need to think some more about your vdps, though.
As I understand it, you're agreeing with the general intent of the
original C11 design that some new mechanism to tell the compiler not
to optimise in certain places in required, but differing in the way
you identify those?
>> Second, it involves reasoning about counterfactual executions. That
>> doesn't necessarily make it wrong, per se, but probably makes it hard
>> to work with. For example, suppose that in all the actual
>> whole-program executions, a runtime occurrence of L only ever returns
>> one particular value (perhaps because of some simple #define'd
>> configuration)
>
> Right, or just because it's a deterministic program.
>
> This is a problem for the definition of value dependencies, AFAICT,
> because where do you put the line for what the compiler is allowed to
> speculate about? When does the program indeed "reveal" the value of a
> load, and when does it not? Is a compiler to do out-of-band tracking
> for certain memory locations, and thus predict the value?
>
> Adding the assumption that L, at least conceptually, might return any
> value seemed to be a simple way to remove that uncertainty regarding
> what the compiler might know about.
...but it brings in a lot of baggage.
>> , and that the code used in the evaluation of E depends
>> on some invariant which is related to that configuration.
>
> A related issue that I've been thinking about is whether things like
> divide-by-zero (ie, operations that give rise to undefined behavior)
> should be considered as constraints on the values or not. I guess they
> should, but then this seems closer what you mention, invariants
> established by the whole program.
>
>> The
>> hypothetical execution used above in which a different value is used
>> is one in the code is being run in a situation with broken invariants.
>> Then there will be technical difficulties in using the definition:
>> I don't see how one would persuade oneself that a compiler always
>> satisfies it, because these hypothetical executions are far removed
>> from what it's actually working on.
>
> I think the above is easy to implement for a compiler. At the
> mo_consume load, you simply forget any information you have about this
> value / memory location; for operations on value_dep_preserving types,
are all the subexpressions likewise forced to be of vdp types?
> you ignore any out-of-band information that might originate from code
> run before the load (IOW, you take a fresh start on each load in terms
> of analysis info). Having the requirement that the program indeed needs
> to have a value dependency should allow the compiler to run normal
> optimizations on the code.
>
> I'd really appreciate any feedback or alternative suggestions for how to
> formulate the requirements more precisely. I think many of us kind of
> agree about the general approach we'd like to try, but specifying this
> precisely is another step.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists