lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <E151C478-11A4-4B11-82A5-9A4D637EFEC2@vmware.com>
Date:   Mon, 11 Jun 2018 03:47:12 +0000
From:   Nadav Amit <namit@...are.com>
To:     "hpa@...or.com" <hpa@...or.com>
CC:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        the arch/x86 maintainers <x86@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Kate Stewart <kstewart@...uxfoundation.org>,
        Philippe Ombredanne <pombredanne@...b.com>,
        Arnd Bergmann <arnd@...db.de>
Subject: Re: [PATCH v3 9/9] x86: jump-labels: use macros instead of inline
 assembly

at 6:29 PM, hpa@...or.com wrote:

> On June 10, 2018 7:19:11 AM PDT, Nadav Amit <namit@...are.com> wrote:
>> Use assembly macros for jump-labels and call them from inline assembly.
>> This not only makes the code more readable, but also improves
>> compilation decision, specifically inline decisions which GCC base on
>> the number of new lines in inline assembly.
>> 
>> As a result the code size is slightly increased.
>> 
>>  text	   data	    bss	    dec	    hex	filename
>> 18163528 10226300 2957312 31347140 1de51c4 ./vmlinux before
>> 18163608 10227348 2957312 31348268 1de562c ./vmlinux after (+1128)
>> 
>> And functions such as intel_pstate_adjust_policy_max(),
>> kvm_cpu_accept_dm_intr(), kvm_register_read() are inlined.
>> 
>> Cc: Thomas Gleixner <tglx@...utronix.de>
>> Cc: Ingo Molnar <mingo@...hat.com>
>> Cc: "H. Peter Anvin" <hpa@...or.com>
>> Cc: x86@...nel.org
>> Cc: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
>> Cc: Kate Stewart <kstewart@...uxfoundation.org>
>> Cc: Philippe Ombredanne <pombredanne@...b.com>
>> 
>> Signed-off-by: Nadav Amit <namit@...are.com>
>> ---
>> arch/x86/include/asm/jump_label.h | 65 ++++++++++++++++++-------------
>> arch/x86/kernel/macros.S          |  1 +
>> 2 files changed, 39 insertions(+), 27 deletions(-)
>> 
>> diff --git a/arch/x86/include/asm/jump_label.h
>> b/arch/x86/include/asm/jump_label.h
>> index 8c0de4282659..ea0633a41122 100644
>> --- a/arch/x86/include/asm/jump_label.h
>> +++ b/arch/x86/include/asm/jump_label.h
>> @@ -2,19 +2,6 @@
>> #ifndef _ASM_X86_JUMP_LABEL_H
>> #define _ASM_X86_JUMP_LABEL_H
>> 
>> -#ifndef HAVE_JUMP_LABEL
>> -/*
>> - * For better or for worse, if jump labels (the gcc extension) are
>> missing,
>> - * then the entire static branch patching infrastructure is compiled
>> out.
>> - * If that happens, the code in here will malfunction.  Raise a
>> compiler
>> - * error instead.
>> - *
>> - * In theory, jump labels and the static branch patching
>> infrastructure
>> - * could be decoupled to fix this.
>> - */
>> -#error asm/jump_label.h included on a non-jump-label kernel
>> -#endif
>> -
>> #define JUMP_LABEL_NOP_SIZE 5
>> 
>> #ifdef CONFIG_X86_64
>> @@ -28,18 +15,27 @@
>> 
>> #ifndef __ASSEMBLY__
>> 
>> +#ifndef HAVE_JUMP_LABEL
>> +/*
>> + * For better or for worse, if jump labels (the gcc extension) are
>> missing,
>> + * then the entire static branch patching infrastructure is compiled
>> out.
>> + * If that happens, the code in here will malfunction.  Raise a
>> compiler
>> + * error instead.
>> + *
>> + * In theory, jump labels and the static branch patching
>> infrastructure
>> + * could be decoupled to fix this.
>> + */
>> +#error asm/jump_label.h included on a non-jump-label kernel
>> +#endif
>> +
>> #include <linux/stringify.h>
>> #include <linux/types.h>
>> 
>> static __always_inline bool arch_static_branch(struct static_key *key,
>> bool branch)
>> {
>> -	asm_volatile_goto("1:"
>> -		".byte " __stringify(STATIC_KEY_INIT_NOP) "\n\t"
>> -		".pushsection __jump_table,  \"aw\" \n\t"
>> -		_ASM_ALIGN "\n\t"
>> -		_ASM_PTR "1b, %l[l_yes], %c0 + %c1 \n\t"
>> -		".popsection \n\t"
>> -		: :  "i" (key), "i" (branch) : : l_yes);
>> +	asm_volatile_goto("STATIC_BRANCH_GOTO l_yes=\"%l[l_yes]\" key=\"%c0\"
>> "
>> +			  "branch=\"%c1\""
>> +			: :  "i" (key), "i" (branch) : : l_yes);
>> 
>> 	return false;
>> l_yes:
>> @@ -48,13 +44,8 @@ static __always_inline bool
>> arch_static_branch(struct static_key *key, bool bran
>> 
>> static __always_inline bool arch_static_branch_jump(struct static_key
>> *key, bool branch)
>> {
>> -	asm_volatile_goto("1:"
>> -		".byte 0xe9\n\t .long %l[l_yes] - 2f\n\t"
>> -		"2:\n\t"
>> -		".pushsection __jump_table,  \"aw\" \n\t"
>> -		_ASM_ALIGN "\n\t"
>> -		_ASM_PTR "1b, %l[l_yes], %c0 + %c1 \n\t"
>> -		".popsection \n\t"
>> +	asm_volatile_goto("STATIC_BRANCH_JUMP_GOTO l_yes=\"%l[l_yes]\"
>> key=\"%c0\" "
>> +			  "branch=\"%c1\""
>> 		: :  "i" (key), "i" (branch) : : l_yes);
>> 
>> 	return false;
>> @@ -108,6 +99,26 @@ struct jump_entry {
>> 	.popsection
>> .endm
>> 
>> +.macro STATIC_BRANCH_GOTO l_yes:req key:req branch:req
>> +1:
>> +	.byte STATIC_KEY_INIT_NOP
>> +	.pushsection __jump_table, "aw"
>> +	_ASM_ALIGN
>> +	_ASM_PTR 1b, \l_yes, \key + \branch
>> +	.popsection
>> +.endm
>> +
>> +.macro STATIC_BRANCH_JUMP_GOTO l_yes:req key:req branch:req
>> +1:
>> +	.byte 0xe9
>> +	.long \l_yes - 2f
>> +2:
>> +	.pushsection __jump_table, "aw"
>> +	_ASM_ALIGN
>> +	_ASM_PTR 1b, \l_yes, \key + \branch
>> +	.popsection
>> +.endm
>> +
>> #endif	/* __ASSEMBLY__ */
>> 
>> #endif
>> diff --git a/arch/x86/kernel/macros.S b/arch/x86/kernel/macros.S
>> index bf8b9c93e255..161c95059044 100644
>> --- a/arch/x86/kernel/macros.S
>> +++ b/arch/x86/kernel/macros.S
>> @@ -13,3 +13,4 @@
>> #include <asm/paravirt.h>
>> #include <asm/asm.h>
>> #include <asm/cpufeature.h>
>> +#include <asm/jump_label.h>
> 
> If the code size increases, do you have any metrics for improvement?
> 
> That being said, I do like the readability improvements and the ability to
> use gas macros in assembly as opposed to cpp macros wrapped in different
> ways for C and assembly.

I didn’t try to measure the performance impact of each patch independently.
I measured the paravirt patch impact, since it was relatively easy to see
that before the change a lot of page-table manipulation functions were not
inlined, and indeed I got 2% performance improvement for a microbenchmark
of repeated page-fault + MADV_DONTNEED.

For this patch, it might be slightly hard to see an effect on performance.
However, functions of kvm, btrfs, ext4, xen, and xfs and other are affected.

Sampling some changes shows that the compiler makes reasonable decisions
that can increase the code size. For example, with this patch
kvm_pmi_trigger_fn() gets kvm_pmu_deliver_pmi() inlined. Since
kvm_pmu_deliver_pmi() is used in a different object (x86.o), a non-inlined
instance of the function is produced as well, which causes the binary size
to (slightly) increase.

I think that the value of this series is in the overall impact. All it does
is prevent the compiler from making wrong decisions. This series can
presumably prevent __always_inline annotations such as in a recent patch
[1], in which the encountered issue, I suspect was caused by the use of
static_cpu_has().

Regards,
Nadav

[1] https://patchwork.kernel.org/patch/10450037/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ