[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <55565C9B.3020607@redhat.com>
Date: Fri, 15 May 2015 22:52:43 +0200
From: Denys Vlasenko <dvlasenk@...hat.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>,
Andy Lutomirski <luto@...capital.net>,
Davidlohr Bueso <dave@...olabs.net>,
Peter Anvin <hpa@...or.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Tim Chen <tim.c.chen@...ux.intel.com>,
Borislav Petkov <bp@...en8.de>,
Peter Zijlstra <peterz@...radead.org>,
"Chandramouleeswaran, Aswin" <aswin@...com>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Brian Gerst <brgerst@...il.com>,
Paul McKenney <paulmck@...ux.vnet.ibm.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...nel.org>, Jason Low <jason.low2@...com>
CC: "linux-tip-commits@...r.kernel.org"
<linux-tip-commits@...r.kernel.org>
Subject: Re: [tip:x86/asm] x86: Pack function addresses tightly as well
On 05/15/2015 08:36 PM, Linus Torvalds wrote:
> On Fri, May 15, 2015 at 2:39 AM, tip-bot for Ingo Molnar
> <tipbot@...or.com> wrote:
>>
>> We can pack function addresses tightly as well:
>
> So I really want to see performance numbers on a few
> microarchitectures for this one in particular.
>
> The kernel generally doesn't have loops (well, not the kinds of
> high-rep loops that tend to be worth aligning), and I think the
> general branch/loop alignment is likely fine. But the function
> alignment doesn't tend to have the same kind of I$ advantages, it's
> more lilely purely a size issue and not as interesting. Function
> targets are also more likely to be not in the cache, I suspect, since
> you don't have a loop priming it or a short forward jump that just got
> the cacheline anyway. And then *not* aligning the function would
> actually tend to make it *less* dense in the I$.
How about taking an intermediate step and using -falign-functions=6.
This means "align to 8 if it requires skipping less than 6 bytes".
Why < 6? Because with CONFIG_FTRACE=y, every function starts with
5-byte instruction ("call ftrace", replaced by a 5-byte nop).
We want at least this one insn to be decoded at once.
Without CONFIG_FTRACE, it's not as clear-cut, but typical x86
insns are 5 bytes or less, so it will still make most fuctions
start executing reasonably quickly at the cost of only 2.5 bytes of padding
on average.
I'd prefer "align to 16 if it requires skipping less than 6 bytes"
because aligning to 8 which is not a multiple of 16 doesn't
make sense on modern CPUs (it can in fact hurt a bit), but alas,
gcc's option format doesn't allow that.
If you don't like the 8-byte alignment, the smallest option which would
align to 16 bytes is -falign-functions=9: it means
"align to 16 if it requires skipping less than 9 bytes".
Still significantly better than insane padding to 16 even if we at
address just a few bytes past cacheline start (0x1231 -> 0x1240).
The last thing. If CONFIG_CC_OPTIMIZE_FOR_SIZE=y, we probably
shouldn't do any alignment.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists