lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <A74E4351-A9FA-434D-AD47-3A509F802FFB@vmware.com>
Date:   Tue, 1 May 2018 06:50:14 +0000
From:   Nadav Amit <namit@...are.com>
To:     Josh Poimboeuf <jpoimboe@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>
CC:     Ingo Molnar <mingo@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Suboptimal inline heuristics due to non-code sections

When gcc considers the size of a function for inlining decisions, it
apparently considers *all* sections. Since the kernel extensively uses
sections for things other than code (e.g., exception-table, bug-table), the
optimality of these decisions seem questionable to me.

The objtool’s sections may be the most extreme case, as these sections are
discarded, while their size still appears to be considered by the inlining
heuristics. It may be beneficial not to consider (some) the other sections
as well, as they do not affect code-caching but only increase the kernel
size.

To illustrate the issue, consider the function copy_overflow():

   0xffffffff819315e0 <+0>:	push   %rbp
   0xffffffff819315e1 <+1>:	mov    %rsi,%rdx
   0xffffffff819315e4 <+4>:	mov    %edi,%esi
   0xffffffff819315e6 <+6>:	mov    $0xffffffff820bc4b8,%rdi
   0xffffffff819315ed <+13>:	mov    %rsp,%rbp
   0xffffffff819315f0 <+16>:	callq  0xffffffff81089b70 <__warn_printk>
   0xffffffff819315f5 <+21>:	ud2    
   0xffffffff819315f7 <+23>:	pop    %rbp
   0xffffffff819315f8 <+24>:	retq   

This function seems to me as a great candidate for inlining. Yet, in my 4.16
build (using gcc 7.2), I get 38 non-inlined instances of this function in
vmlinux. Forcing CONFIG_STACK_VALIDATION to be disabled reduces the number
non-inlined instances to 35. Removing, in addition, the data which is saved
in the __bug_table makes all the instances of the function to be inlined.

Obviously this certain function can be set as __always_inline, but the inline
heuristics seems to me as wrongfully biased. 

What do you think?

Is there a way to make gcc to ignore sections for its inlining heuristics?

Thanks,
Nadav

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ