lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 15 May 2015 22:52:43 +0200
From:	Denys Vlasenko <dvlasenk@...hat.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andy Lutomirski <luto@...capital.net>,
	Davidlohr Bueso <dave@...olabs.net>,
	Peter Anvin <hpa@...or.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Tim Chen <tim.c.chen@...ux.intel.com>,
	Borislav Petkov <bp@...en8.de>,
	Peter Zijlstra <peterz@...radead.org>,
	"Chandramouleeswaran, Aswin" <aswin@...com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Brian Gerst <brgerst@...il.com>,
	Paul McKenney <paulmck@...ux.vnet.ibm.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...nel.org>, Jason Low <jason.low2@...com>
CC:	"linux-tip-commits@...r.kernel.org" 
	<linux-tip-commits@...r.kernel.org>
Subject: Re: [tip:x86/asm] x86: Pack function addresses tightly as well

On 05/15/2015 08:36 PM, Linus Torvalds wrote:
> On Fri, May 15, 2015 at 2:39 AM, tip-bot for Ingo Molnar
> <tipbot@...or.com> wrote:
>>
>> We can pack function addresses tightly as well:
> 
> So I really want to see performance numbers on a few
> microarchitectures for this one in particular.
> 
> The kernel generally doesn't have loops (well, not the kinds of
> high-rep loops that tend to be worth aligning), and I think the
> general branch/loop alignment is likely fine. But the function
> alignment doesn't tend to have the same kind of I$ advantages, it's
> more lilely purely a size issue and not as interesting. Function
> targets are also more likely to be not in the cache, I suspect, since
> you don't have a loop priming it or a short forward jump that just got
> the cacheline anyway. And then *not* aligning the function would
> actually tend to make it *less* dense in the I$.

How about taking an intermediate step and using -falign-functions=6.
This means "align to 8 if it requires skipping less than 6 bytes".

Why < 6? Because with CONFIG_FTRACE=y, every function starts with
5-byte instruction ("call ftrace", replaced by a 5-byte nop).
We want at least this one insn to be decoded at once.

Without CONFIG_FTRACE, it's not as clear-cut, but typical x86
insns are 5 bytes or less, so it will still make most fuctions
start executing reasonably quickly at the cost of only 2.5 bytes of padding
on average.

I'd prefer "align to 16 if it requires skipping less than 6 bytes"
because aligning to 8 which is not a multiple of 16 doesn't
make sense on modern CPUs (it can in fact hurt a bit), but alas,
gcc's option format doesn't allow that.

If you don't like the 8-byte alignment, the smallest option which would
align to 16 bytes is -falign-functions=9: it means
"align to 16 if it requires skipping less than 9 bytes".
Still significantly better than insane padding to 16 even if we at
address just a few bytes past cacheline start (0x1231 -> 0x1240).

The last thing. If CONFIG_CC_OPTIMIZE_FOR_SIZE=y, we probably
shouldn't do any alignment.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ