[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20220210223134.233757-1-morbo@google.com>
Date: Thu, 10 Feb 2022 14:31:34 -0800
From: Bill Wendling <morbo@...gle.com>
To: Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
"H . Peter Anvin" <hpa@...or.com>,
Nathan Chancellor <nathan@...nel.org>,
Nick Desaulniers <ndesaulniers@...gle.com>,
Juergen Gross <jgross@...e.com>,
Peter Zijlstra <peterz@...radead.org>,
Andy Lutomirski <luto@...nel.org>, llvm@...ts.linux.dev
Cc: linux-kernel@...r.kernel.org, Bill Wendling <morbo@...gle.com>
Subject: [PATCH v4] x86: use builtins to read eflags
GCC and Clang both have builtins to read and write the EFLAGS register.
This allows the compiler to determine the best way to generate this
code, which can improve code generation.
This issue arose due to Clang's issue with the "=rm" constraint. Clang
chooses to be conservative in these situations, and so uses memory
instead of registers. This is a known issue, which is currently being
addressed.
However, using builtins is beneficial in general, because it removes the
burden of determining what's the way to read the flags register from the
programmer and places it on to the compiler, which has the information
needed to make that decision. Indeed, this piece of code has had several
changes over the years, some of which were pinging back and forth to
determine the correct constraints to use.
With this change, Clang generates better code:
Original code:
movq $0, -48(%rbp)
#APP
# __raw_save_flags
pushfq
popq -48(%rbp)
#NO_APP
movq -48(%rbp), %rbx
New code:
pushfq
popq %rbx
#APP
Note that the stack slot in the original code is no longer needed in the
new code, saving a small amount of stack space.
There is no change to GCC's output:
Original code:
# __raw_save_flags
pushf ; pop %r13 # flags
New code:
pushfq
popq %r13 # _23
Signed-off-by: Bill Wendling <morbo@...gle.com>
---
v4: - Clang now no longer generates stack frames when using these builtins.
- Corrected misspellings.
v3: - Add blurb indicating that GCC's output hasn't changed.
v2: - Kept the original function to retain the out-of-line symbol.
- Improved the commit message.
- Note that I couldn't use Nick's suggestion of
return IS_ENABLED(CONFIG_X86_64) ? ...
because Clang complains about using __builtin_ia32_readeflags_u32 in
64-bit mode.
---
arch/x86/include/asm/irqflags.h | 19 +++++--------------
1 file changed, 5 insertions(+), 14 deletions(-)
diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index 87761396e8cc..f31a035f3c6a 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -19,20 +19,11 @@
extern inline unsigned long native_save_fl(void);
extern __always_inline unsigned long native_save_fl(void)
{
- unsigned long flags;
-
- /*
- * "=rm" is safe here, because "pop" adjusts the stack before
- * it evaluates its effective address -- this is part of the
- * documented behavior of the "pop" instruction.
- */
- asm volatile("# __raw_save_flags\n\t"
- "pushf ; pop %0"
- : "=rm" (flags)
- : /* no input */
- : "memory");
-
- return flags;
+#ifdef CONFIG_X86_64
+ return __builtin_ia32_readeflags_u64();
+#else
+ return __builtin_ia32_readeflags_u32();
+#endif
}
static __always_inline void native_irq_disable(void)
--
2.35.1.265.g69c8d7142f-goog
Powered by blists - more mailing lists