lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <169745455193.3135.696745948211732755.tip-bot2@tip-bot2>
Date:   Mon, 16 Oct 2023 11:09:11 -0000
From:   "tip-bot2 for Uros Bizjak" <tip-bot2@...utronix.de>
To:     linux-tip-commits@...r.kernel.org
Cc:     Nadav Amit <namit@...are.com>, Uros Bizjak <ubizjak@...il.com>,
        Ingo Molnar <mingo@...nel.org>,
        Andy Lutomirski <luto@...nel.org>,
        Brian Gerst <brgerst@...il.com>,
        Denys Vlasenko <dvlasenk@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Josh Poimboeuf <jpoimboe@...hat.com>,
        Sean Christopherson <seanjc@...gle.com>, x86@...nel.org,
        linux-kernel@...r.kernel.org
Subject: [tip: x86/percpu] x86/percpu: Use C for arch_raw_cpu_ptr(), to
 improve code generation

The following commit has been merged into the x86/percpu branch of tip:

Commit-ID:     1d10f3aec2bb734b4b594afe8c1bd0aa656a7e4d
Gitweb:        https://git.kernel.org/tip/1d10f3aec2bb734b4b594afe8c1bd0aa656a7e4d
Author:        Uros Bizjak <ubizjak@...il.com>
AuthorDate:    Sun, 15 Oct 2023 22:24:40 +02:00
Committer:     Ingo Molnar <mingo@...nel.org>
CommitterDate: Mon, 16 Oct 2023 12:52:02 +02:00

x86/percpu: Use C for arch_raw_cpu_ptr(), to improve code generation

Implement arch_raw_cpu_ptr() in C to allow the compiler to perform
better optimizations, such as setting an appropriate base to compute
the address. The compiler is free to choose either MOV or ADD from
this_cpu_off address to construct the optimal final address.

There are some other issues when memory access to the percpu area is
implemented with an asm. Compilers can not eliminate asm common
subexpressions over basic block boundaries, but are extremely good
at optimizing memory access. By implementing arch_raw_cpu_ptr() in C,
the compiler can eliminate additional redundant loads from this_cpu_off,
further reducing the number of percpu offset reads from 1646 to 1631
on a test build, a -0.9% reduction.

Co-developed-by: Nadav Amit <namit@...are.com>
Signed-off-by: Nadav Amit <namit@...are.com>
Signed-off-by: Uros Bizjak <ubizjak@...il.com>
Signed-off-by: Ingo Molnar <mingo@...nel.org>
Cc: Andy Lutomirski <luto@...nel.org>
Cc: Brian Gerst <brgerst@...il.com>
Cc: Denys Vlasenko <dvlasenk@...hat.com>
Cc: H. Peter Anvin <hpa@...or.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Josh Poimboeuf <jpoimboe@...hat.com>
Cc: Uros Bizjak <ubizjak@...il.com>
Cc: Sean Christopherson <seanjc@...gle.com>
Link: https://lore.kernel.org/r/20231015202523.189168-2-ubizjak@gmail.com
---
 arch/x86/include/asm/percpu.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 915675f..5474690 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -49,6 +49,21 @@
 #define __force_percpu_prefix	"%%"__stringify(__percpu_seg)":"
 #define __my_cpu_offset		this_cpu_read(this_cpu_off)
 
+#ifdef CONFIG_USE_X86_SEG_SUPPORT
+/*
+ * Efficient implementation for cases in which the compiler supports
+ * named address spaces.  Allows the compiler to perform additional
+ * optimizations that can save more instructions.
+ */
+#define arch_raw_cpu_ptr(ptr)					\
+({								\
+	unsigned long tcp_ptr__;				\
+	tcp_ptr__ = __raw_cpu_read(, this_cpu_off);		\
+								\
+	tcp_ptr__ += (unsigned long)(ptr);			\
+	(typeof(*(ptr)) __kernel __force *)tcp_ptr__;		\
+})
+#else /* CONFIG_USE_X86_SEG_SUPPORT */
 /*
  * Compared to the generic __my_cpu_offset version, the following
  * saves one instruction and avoids clobbering a temp register.
@@ -63,6 +78,8 @@
 	tcp_ptr__ += (unsigned long)(ptr);			\
 	(typeof(*(ptr)) __kernel __force *)tcp_ptr__;		\
 })
+#endif /* CONFIG_USE_X86_SEG_SUPPORT */
+
 #else /* CONFIG_SMP */
 #define __percpu_seg_override
 #define __percpu_prefix		""

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ