[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHWkzRQSaKOM23yg1LbCO=uWremNzwnXUCUJF2H-+z_Xhmp79g@mail.gmail.com>
Date:	Tue, 18 Feb 2014 12:12:06 +0000
From:	Peter Sewell <Peter.Sewell@...cam.ac.uk>
To:	Peter Sewell <Peter.Sewell@...cam.ac.uk>,
	"mark.batty@...cam.ac.uk" <Mark.Batty@...cam.ac.uk>,
	Paul McKenney <paulmck@...ux.vnet.ibm.com>,
	peterz@...radead.org, Torvald Riegel <triegel@...hat.com>,
	torvalds@...ux-foundation.org, Will Deacon <will.deacon@....com>,
	Ramana.Radhakrishnan@....com, dhowells@...hat.com,
	linux-arch@...r.kernel.org, linux-kernel@...r.kernel.org,
	akpm@...ux-foundation.org, mingo@...nel.org, gcc@....gnu.org
Subject: Re: [RFC][PATCH 0/5] arch: atomic rework
Several of you have said that the standard and compiler should not
permit speculative writes of atomics, or (effectively) that the
compiler should preserve dependencies.  In simple examples it's easy
to see what that means, but in general it's not so clear what the
language should guarantee, because dependencies may go via non-atomic
code in other compilation units, and we have to consider the extent to
which it's desirable to limit optimisation there.
For example, suppose we have, in one compilation unit:
    void f(int ra, int*rb) {
      if (ra==42)
        *rb=42;
      else
        *rb=42;
    }
and in another compilation unit the bodies of two threads:
    // Thread 0
    r1 = x;
    f(r1,&r2);
    y = r2;
    // Thread 1
    r3 = y;
    f(r3,&r4);
    x = r4;
where accesses to x and y are annotated C11 atomic
memory_order_relaxed or Linux ACCESS_ONCE(), accesses to
r1,r2,r3,r4,ra,rb are not annotated, and x and y initially hold 0.
(Of course, this is an artificial example, to make the point below as
simply as possible - in real code the branches of the conditional
might not be syntactically identical, just equivalent after macro
expansion and other optimisation.)
In the source program there's a dependency from the read of x to the
write of y in Thread 0, and from the read of y to the write of x on
Thread 1.  Dependency-respecting compilation would preserve those and
the ARM and POWER architectures both respect them, so the reads of x
and y could not give 42.
But a compiler might well optimise the (non-atomic) body of f() to
just *rb=42, making the threads effectively
    // Thread 0
    r1 = x;
    y = 42;
    // Thread 1
    r3 = y;
    x = 42;
(GCC does this at O1, O2, and O3) and the ARM and POWER architectures
permit those two reads to see 42. That is moreover actually observable
on current ARM hardware.
So as far as we can see, either:
1) if you can accept the latter behaviour (if the Linux codebase does
   not rely on its absence), the language definition should permit it,
   and current compiler optimisations can be used,
or
2) otherwise, the language definition should prohibit it but the
   compiler would have to preserve dependencies even in compilation
   units that have no mention of atomics.  It's unclear what the
   (runtime and compiler development) cost of that would be in
   practice - perhaps Torvald could comment?
For more context, this example is taken from a summary of the thin-air
problem by Mark Batty and myself,
<www.cl.cam.ac.uk/~pes20/cpp/notes42.html>, and the problem with
dependencies via other compilation units was AFAIK first pointed out
by Hans Boehm.
Peter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Powered by blists - more mailing lists
 
