[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150706173223.GA30566@gmail.com>
Date: Mon, 6 Jul 2015 19:32:23 +0200
From: Ingo Molnar <mingo@...nel.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Andy Lutomirski <luto@...nel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
the arch/x86 maintainers <x86@...nel.org>,
Jan Kara <jack@...e.cz>, Borislav Petkov <bp@...en8.de>,
Denys Vlasenko <dvlasenk@...hat.com>
Subject: Re: [PATCH] x86: Fix detection of GCC -mpreferred-stack-boundary
support
* Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> On Mon, Jul 6, 2015 at 6:44 AM, Ingo Molnar <mingo@...nel.org> wrote:
> >
> > So looking at this I question the choice of -mpreferred-stack-boundary=3. Why
> > not do -mpreferred-stack-boundary=2?
>
> It wouldn't make sense anyway - it would only make code worse (if it worked) and
> not any better.
>
> The reason the "=3" value is good is because 8-byte alignment is the "natural"
> alignment - it's what you get with a normal call sequence, simply because the
> return address is 8 bytes in size.
>
> That means that with "=3" you don't get extra code to align the stack for the
> simple functions that don't need a frame.
>
> Anything smaller than 3 wouldn't help even if it worked, because none of the
> normal stack operations (pushing/popping registers to save/restore them) would
> be any smaller anyway.
>
> But bigger values than 3 result in the compiler having to generate extra stack
> adjustments just to align the stack after a call that very naturally mis-aligned
> it. And it doesn't help anyway, since in the kernel we don't put stuff on the
> stack that needs bigger alignment (of, the fxsave buffer is a counter-example,
> but it's a very odd one that we _shouldn't_ have put on the stack).
Ok, so it's all moot, but my (quite possibly flawed) thinking was that for deeper
call chains, using 4 byte RSP alignment (as opposed to 8 bytes) would allow, in
about 50% of the cases, the stack frame to be narrower by 4 bytes. (depending on
whether the 'natural' stack boundary is properly aligned to 8 bytes or not.)
For a 10 deep call chain that's a 20 bytes more compact stack on average
(10*4*0.5), resulting in a tiny bit denser D$.
My assumptions were:
- no extra code is generated by GCC. (If it causes any extra code to be generated
then it's an obvious loss.)
- mis-aligning an 8 byte variable by 4 bytes is being handled quite well by most
x86 uarchs, without penalty in most cases.
But ... it's all moot and even in the best case if both my assumptions are fully
met (which is not a given), the advantages are pretty marginal, so consider the
idea dead by multiple mortal wounds.
Thanks,
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists