[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181218100014.GA16284@hirez.programming.kicks-ass.net>
Date: Tue, 18 Dec 2018 11:00:14 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Andi Kleen <ak@...ux.intel.com>
Cc: Arnd Bergmann <arnd@...db.de>, Nicolas Pitre <nico@...aro.org>,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
Paul McKenney <paulmck@...ux.vnet.ibm.com>,
Will Deacon <will.deacon@....com>,
Ingo Molnar <mingo@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH 0/7] ARM: hacks for link-time optimization
On Tue, Dec 18, 2018 at 10:18:24AM +0100, Peter Zijlstra wrote:
> In particular turning an address-dependency into a control-dependency,
> which is something allowed by the C language, since it doesn't recognise
> these concepts as such.
>
> The 'optimization' is allowed currently, but LTO will make it much more
> likely since it will have a much wider view of things. Esp. when combined
> with PGO.
>
> Specifically; if you have something like:
>
> int idx;
> struct object objs[2];
>
> the statement:
>
> val = objs[idx & 1].ponies;
>
> which you 'need' to be translated like:
>
> struct object *obj = objs;
> obj += (idx & 1);
> val = obj->ponies;
>
> Such that the load of obj->ponies depends on the load of idx. However
> our dear compiler is allowed to make it:
>
> if (idx & 1)
> obj = &objs[1];
> else
> obj = &objs[0];
>
> val = obj->ponies;
>
> Because C doesn't recognise this as being different. However this is
> utterly broken, because in this translation we can speculate the load
> of obj->ponies such that it no longer depends on the load of idx, which
> breaks RCU.
>
> Note that further 'optimization' is possible and the compiler could even
> make it:
>
> if (idx & 1)
> val = objs[1].ponies;
> else
> val = objs[0].ponies;
A variant that is actually broken on x86 too (due to issuing the loads
in the 'wrong' order):
val = objs[0].ponies;
if (idx & 1)
val = objs[1].ponies;
Which is a translation that makes sense if we either marked
unlikely(idx & 1) or if PGO found the same.
> Now, granted, this is a fairly artificial example, but it does
> illustrate the exact problem.
>
> The more the compiler can see of the complete program, the more likely
> it can make inferrences like this, esp. when coupled with PGO.
>
> Now, we're (usually) very careful to wrap things in READ_ONCE() and
> rcu_dereference() and the like, which makes it harder on the compiler
> (because 'volatile' is special), but nothing really stops it from doing
> this.
>
> Paul has been trying to beat clue into the language people, but given
> he's been at it for 10 years now, and there's no resolution, I figure we
> ought to get compiler implementations to give us a knob.
Powered by blists - more mailing lists