[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170918174031.u5bb6oxpsr6u7gad@treble>
Date: Mon, 18 Sep 2017 12:40:31 -0500
From: Josh Poimboeuf <jpoimboe@...hat.com>
To: Andrey Ryabinin <aryabinin@...tuozzo.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
the arch/x86 maintainers <x86@...nel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
"H. Peter Anvin" <hpa@...or.com>,
Andy Lutomirski <luto@...nel.org>,
Alexander Potapenko <glider@...gle.com>,
Dmitriy Vyukov <dvyukov@...gle.com>,
Matthias Kaehlcke <mka@...omium.org>,
Arnd Bergmann <arnd@...db.de>,
Peter Zijlstra <peterz@...radead.org>
Subject: Re: [RFC PATCH 3/4] x86/asm: Make alternative macro interfaces more
clear and consistent
On Sun, Sep 17, 2017 at 01:22:32AM +0300, Andrey Ryabinin wrote:
> On 09/16/2017 02:29 AM, Josh Poimboeuf wrote:
> > On Fri, Sep 15, 2017 at 11:01:19AM -0700, Linus Torvalds wrote:
> >> On Fri, Sep 15, 2017 at 9:53 AM, Andrey Ryabinin
> >> <aryabinin@...tuozzo.com> wrote:
> >>>
> >>> I'm not so sure that this is disabled optimization. I assume that global rsp makes
> >>> changes something in gcc's register allocation logic, or something like that leading
> >>> to subtle changes in generated code.
> >>>
> >>> But what I recently find out, is that this "regression" sometimes is actually improvement in .text size.
> >>> It all depends on .config, e.g:
> >>
> >> Oh, that would be lovely and solve all the issues.
> >>
> >> And looking at the code generation differences for one file
> >> (kernel/futex.c) and one single config (my default config), the thing
> >> that the global stack register seems to change is that it moves some
> >> code - particularly completely unrelated inline asm code - inside the
> >> region protected by frame pointers.
> >>
> >> There are a few register allocation changes too, but they didn't seem
> >> to make code worse, and I think they were just "incidental" from code
> >> movement. And most code movement really seemed to be around inline
> >> asms, I wonder if the gcc logic simply is something like "if the
> >> stack pointer is visible as a register, don't move any inline asm
> >> across a frame setup".
> >>
> >> In fact, on that one file and one configuration, the resulting
> >> assembler file had three fewer lines of code with that global stack
> >> register declaration than with the local one.
> >>
> >> So at least from just that one case, I can back up Andrey's
> >> observation: it's not that the code gets worse, it just is slightly
> >> different. Sometimes it's better.
> >>
> >> So maybe that simple patch to just make the stack pointer be a global
> >> register declaration really is the fix for this issue.
> >>
> >> It's not *pretty*, and I'd much rather just see some explicit way for
> >> us to say "this asm wants the frame to be set up", but of the
> >> alternatives we've seen, maybe it's the right thing to do?
> >
> > Ok, here's the (compile tested) patch in case anybody wants to try it
> > out.
> >
>
> I already had almost the same patch, so I just used mine. Patch seems to be
> the same as yours except cosmetic details which shouldn't affect the end result.
> But just in case it posted below.
>
>
> The patch was applied on top of b38923a068c10fc36ca8f596d650d095ce390b85 commit,
> kernel compiled gcc 7.2.0
>
> allnoconfig: text data bss dec hex filename
> base 946920 206384 1215752 2369056 242620 allnoconfig/vmlinux
> patched 946501 206384 1215752 2368637 24247d allnoconfig_p/vmlinux
>
> defconfig: text data bss dec hex filename
> base 10729978 4840000 876544 16446522 faf43a defconfig/vmlinux
> patched 10730666 4840000 876544 16447210 faf6ea defconfig_p/vmlinux
>
> allyesconfig: text data bss dec hex filename
> base 161016572 152888347 49303552 363208471 15a61f17 allyesconfig/vmlinux
> patched 161003557 152888347 49303552 363195456 15a5ec40 allyesconfig_p/vmlinux
I get similar results with my config: slightly smaller .text, both with
*and* without frame pointers. So yeah, this is probably the best
option, or at least, the least-worst option. I shouldn't have dismissed
the idea so quickly before.
(H.J. Lu suggested another idea, which is to use "rbp" as an input
constraint. I tried it, but it added 22k of text to my kernel with
frame pointers disabled, so that's definitely worse than this.)
If testing continues to goes well, I'll submit an official patch soon.
--
Josh
Powered by blists - more mailing lists