linux-kernel - Re: [PATCH] x86: Align jump targets to 1 byte boundaries

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5527C700.3030405@redhat.com>
Date:	Fri, 10 Apr 2015 14:50:08 +0200
From:	Denys Vlasenko <dvlasenk@...hat.com>
To:	Ingo Molnar <mingo@...nel.org>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
CC:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Jason Low <jason.low2@...com>,
	Peter Zijlstra <peterz@...radead.org>,
	Davidlohr Bueso <dave@...olabs.net>,
	Tim Chen <tim.c.chen@...ux.intel.com>,
	Aswin Chandramouleeswaran <aswin@...com>,
	LKML <linux-kernel@...r.kernel.org>,
	Borislav Petkov <bp@...en8.de>,
	Andy Lutomirski <luto@...capital.net>,
	Brian Gerst <brgerst@...il.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: [PATCH] x86: Align jump targets to 1 byte boundaries

On 04/10/2015 02:08 PM, Ingo Molnar wrote:
> 
> * Ingo Molnar <mingo@...nel.org> wrote:
> 
>> So restructure the loop a bit, to get much tighter code:
>>
>> 0000000000000030 <mutex_spin_on_owner.isra.5>:
>>   30:	55                   	push   %rbp
>>   31:	65 48 8b 14 25 00 00 	mov    %gs:0x0,%rdx
>>   38:	00 00
>>   3a:	48 89 e5             	mov    %rsp,%rbp
>>   3d:	48 39 37             	cmp    %rsi,(%rdi)
>>   40:	75 1e                	jne    60 <mutex_spin_on_owner.isra.5+0x30>
>>   42:	8b 46 28             	mov    0x28(%rsi),%eax
>>   45:	85 c0                	test   %eax,%eax
>>   47:	74 0d                	je     56 <mutex_spin_on_owner.isra.5+0x26>
>>   49:	f3 90                	pause
>>   4b:	48 8b 82 10 c0 ff ff 	mov    -0x3ff0(%rdx),%rax
>>   52:	a8 08                	test   $0x8,%al
>>   54:	74 e7                	je     3d <mutex_spin_on_owner.isra.5+0xd>
>>   56:	31 c0                	xor    %eax,%eax
>>   58:	5d                   	pop    %rbp
>>   59:	c3                   	retq
>>   5a:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)
>>   60:	b8 01 00 00 00       	mov    $0x1,%eax
>>   65:	5d                   	pop    %rbp
>>   66:	c3                   	retq
> 
> Btw., totally off topic, the following NOP caught my attention:
> 
>>   5a:	66 0f 1f 44 00 00    	nopw   0x0(%rax,%rax,1)



> That's a dead NOP that boats the function a bit, added for the 16 byte 
> alignment of one of the jump targets.
> 
> I realize that x86 CPU manufacturers recommend 16-byte jump target 
> alignments (it's in the Intel optimization manual), but the cost of 
> that is very significant:
> 
>         text           data       bss         dec      filename
>     12566391        1617840   1089536    15273767      vmlinux.align.16-byte
>     12224951        1617840   1089536    14932327      vmlinux.align.1-byte
> 
> By using 1 byte jump target alignment (i.e. no alignment at all) we 
> get an almost 3% reduction in kernel size (!) - and a probably similar 
> reduction in I$ footprint.
> 
> So I'm wondering, is the 16 byte jump target optimization suggestion 
> really worth this price? The patch below boots fine and I've not 
> measured any noticeable slowdown, but I've not tried hard.

I am absolutely thrilled by the proposal to cut down on sadistic amounts
of alignment.

However, I'm an -Os guy. Expect -O2 people to disagree :)

New-ish versions of gcc allow people to specify optimization
options per function:

https://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html#Function-Attributes

optimize
    The optimize attribute is used to specify that a function is to be compiled
    with different optimization options than specified on the command line.
    Arguments can either be numbers or strings. Numbers are assumed to be an
    optimization level. Strings that begin with O are assumed to be an
    optimization option, while other options are assumed to be used with
    a -f prefix.

How about not aligning code by default, and using

    #define hot_func __attribute__((optimize("O2","align-functions=16","align-jumps=16")))
    ...

    void hot_func super_often_called_func(...) {...}

in hot code paths?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/