linux-kernel - Re: [PATCH 0/7] ARM: hacks for link-time optimization

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20181218100014.GA16284@hirez.programming.kicks-ass.net>
Date:   Tue, 18 Dec 2018 11:00:14 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Andi Kleen <ak@...ux.intel.com>
Cc:     Arnd Bergmann <arnd@...db.de>, Nicolas Pitre <nico@...aro.org>,
        linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
        Paul McKenney <paulmck@...ux.vnet.ibm.com>,
        Will Deacon <will.deacon@....com>,
        Ingo Molnar <mingo@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH 0/7] ARM: hacks for link-time optimization

On Tue, Dec 18, 2018 at 10:18:24AM +0100, Peter Zijlstra wrote:
> In particular turning an address-dependency into a control-dependency,
> which is something allowed by the C language, since it doesn't recognise
> these concepts as such.
> 
> The 'optimization' is allowed currently, but LTO will make it much more
> likely since it will have a much wider view of things. Esp. when combined
> with PGO.
> 
> Specifically; if you have something like:
> 
> int idx;
> struct object objs[2];
> 
> the statement:
> 
>   val = objs[idx & 1].ponies;
> 
> which you 'need' to be translated like:
> 
>   struct object *obj = objs;
>   obj += (idx & 1);
>   val = obj->ponies;
> 
> Such that the load of obj->ponies depends on the load of idx. However
> our dear compiler is allowed to make it:
> 
>   if (idx & 1)
>     obj = &objs[1];
>   else
>     obj = &objs[0];
> 
>   val = obj->ponies;
> 
> Because C doesn't recognise this as being different. However this is
> utterly broken, because in this translation we can speculate the load
> of obj->ponies such that it no longer depends on the load of idx, which
> breaks RCU.
> 
> Note that further 'optimization' is possible and the compiler could even
> make it:
> 
>   if (idx & 1)
>     val = objs[1].ponies;
>   else
>     val = objs[0].ponies;

A variant that is actually broken on x86 too (due to issuing the loads
in the 'wrong' order):

  val = objs[0].ponies;
  if (idx & 1)
    val = objs[1].ponies;

Which is a translation that makes sense if we either marked
unlikely(idx & 1) or if PGO found the same.

> Now, granted, this is a fairly artificial example, but it does
> illustrate the exact problem.
> 
> The more the compiler can see of the complete program, the more likely
> it can make inferrences like this, esp. when coupled with PGO.
> 
> Now, we're (usually) very careful to wrap things in READ_ONCE() and
> rcu_dereference() and the like, which makes it harder on the compiler
> (because 'volatile' is special), but nothing really stops it from doing
> this.
> 
> Paul has been trying to beat clue into the language people, but given
> he's been at it for 10 years now, and there's no resolution, I figure we
> ought to get compiler implementations to give us a knob.