[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170308173703.2h57rsltma3smbcm@treble>
Date: Wed, 8 Mar 2017 11:37:03 -0600
From: Josh Poimboeuf <jpoimboe@...hat.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Andy Lutomirski <luto@...capital.net>, Pavel Machek <pavel@....cz>,
kernel list <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...nel.org>,
Andrew Lutomirski <luto@...nel.org>,
Borislav Petkov <bp@...en8.de>,
Brian Gerst <brgerst@...il.com>,
Denys Vlasenko <dvlasenk@...hat.com>,
Peter Anvin <hpa@...or.com>,
Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: v4.10: kernel stack frame pointer .. has bad value (null)
On Tue, Mar 07, 2017 at 10:40:14AM -0800, Linus Torvalds wrote:
> On Tue, Mar 7, 2017 at 10:28 AM, Josh Poimboeuf <jpoimboe@...hat.com> wrote:
> >
> > Also, the gcc documentation says -maccumulate-outgoing-args is
> > "generally beneficial for performance and size."
>
> Hmm. I wonder how true that is. I'm pretty sure it generates bigger
> code, although it's probably less noticeable in the kernel (as opposed
> to the traditional x86 "push everything" model) due to having the
> three register arguments.
It does seem to make it bigger. With Pavel's config on gcc 6, if I add
-maccumulate-outgoing-args:
text data bss dec hex filename
12692555 5550652 9146368 27389575 1a1ee87 vmlinux.before
13179531 5546556 9146368 27872455 1a94cc7 vmlinux.after
That's 3.8% more text on x86-32.
(FWIW, on x86-64, the size difference is negligible.)
> And the "it's faster" is almost certainly garbage. It's true on P4 and
> some older AMD cores that couldn't do push/pops quickly.
>
> > Not to mention the fact that -maccumulate-outgoing-args seems to already
> > be enabled in most cases anyway.
>
> Yeah, that's the main argument for this patch, I think - just remove
> the (unusual) special case.
As it turns out, when optimizing for size, gcc seems to ignore
-maccumulate-outgoing-args completely. So I guess we would have to live
with both cases anyway. Which means I'll need to make the unwinder
smart enough to deal with it.
But that brings up another question. If -maccumulate-outgoing-args is
ignored with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, wouldn't using that option
break the things which require -maccumulate-outgoing-args?
So, looking deeper at the various reasons this flag is enabled, they
seem to be mostly obsolete.
- CONFIG_FUNCTION_GRAPH_TRACER sets it on x86-32 because of a gcc bug
where the stack gets aligned before the mcount call. This issue
should be mostly obsolete as most modern compilers now have -mfentry.
We could make it dependent on CC_USING_FENTRY.
- CONFIG_JUMP_LABEL sets it on x86-32 because of a bug in gcc <= 4.5.1
which has since been fixed with
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46226. We could probably
make it gcc-version-dependent.
- x86-64 sets it to apparently make the no-longer-in-tree DWARF unwinder
happy with older versions of gcc.
So it looks like -maccumulate-outgoing-args isn't actually needed in
most cases.
--
Josh
Powered by blists - more mailing lists