lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 8 Mar 2017 11:37:03 -0600
From:   Josh Poimboeuf <jpoimboe@...hat.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Andy Lutomirski <luto@...capital.net>, Pavel Machek <pavel@....cz>,
        kernel list <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...nel.org>,
        Andrew Lutomirski <luto@...nel.org>,
        Borislav Petkov <bp@...en8.de>,
        Brian Gerst <brgerst@...il.com>,
        Denys Vlasenko <dvlasenk@...hat.com>,
        Peter Anvin <hpa@...or.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Re: v4.10: kernel stack frame pointer .. has bad value (null)

On Tue, Mar 07, 2017 at 10:40:14AM -0800, Linus Torvalds wrote:
> On Tue, Mar 7, 2017 at 10:28 AM, Josh Poimboeuf <jpoimboe@...hat.com> wrote:
> >
> > Also, the gcc documentation says -maccumulate-outgoing-args is
> > "generally beneficial for performance and size."
> 
> Hmm. I wonder how true that is. I'm pretty sure it generates bigger
> code, although it's probably less noticeable in the kernel (as opposed
> to the traditional x86 "push everything" model) due to having the
> three register arguments.

It does seem to make it bigger.  With Pavel's config on gcc 6, if I add
-maccumulate-outgoing-args:

   text	   data	    bss	    dec	    hex	filename
12692555	5550652	9146368	27389575	1a1ee87	vmlinux.before
13179531	5546556	9146368	27872455	1a94cc7	vmlinux.after

That's 3.8% more text on x86-32.

(FWIW, on x86-64, the size difference is negligible.)

> And the "it's faster" is almost certainly garbage. It's true on P4 and
> some older AMD cores that couldn't do push/pops quickly.
> 
> > Not to mention the fact that -maccumulate-outgoing-args seems to already
> > be enabled in most cases anyway.
> 
> Yeah, that's the main argument for this patch, I think - just remove
> the (unusual) special case.

As it turns out, when optimizing for size, gcc seems to ignore
-maccumulate-outgoing-args completely.  So I guess we would have to live
with both cases anyway.  Which means I'll need to make the unwinder
smart enough to deal with it.

But that brings up another question.  If -maccumulate-outgoing-args is
ignored with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, wouldn't using that option
break the things which require -maccumulate-outgoing-args?

So, looking deeper at the various reasons this flag is enabled, they
seem to be mostly obsolete.

- CONFIG_FUNCTION_GRAPH_TRACER sets it on x86-32 because of a gcc bug
  where the stack gets aligned before the mcount call.  This issue
  should be mostly obsolete as most modern compilers now have -mfentry.
  We could make it dependent on CC_USING_FENTRY.

- CONFIG_JUMP_LABEL sets it on x86-32 because of a bug in gcc <= 4.5.1
  which has since been fixed with
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46226.  We could probably
  make it gcc-version-dependent.

- x86-64 sets it to apparently make the no-longer-in-tree DWARF unwinder
  happy with older versions of gcc.

So it looks like -maccumulate-outgoing-args isn't actually needed in
most cases.

-- 
Josh

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ