lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKwvOdk=VdDo1fhWJVa4eO0UjuQwtV9kC-cJd0J9-6guU2vafA@mail.gmail.com>
Date:   Mon, 7 Feb 2022 14:11:58 -0800
From:   Nick Desaulniers <ndesaulniers@...gle.com>
To:     Bill Wendling <morbo@...gle.com>
Cc:     Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
        "H . Peter Anvin" <hpa@...or.com>,
        Nathan Chancellor <nathan@...nel.org>,
        Juergen Gross <jgross@...e.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Andy Lutomirski <luto@...nel.org>, llvm@...ts.linux.dev,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3] x86: use builtins to read eflags

On Thu, Feb 3, 2022 at 4:57 PM Bill Wendling <morbo@...gle.com> wrote:
>
> GCC and Clang both have builtins to read and write the EFLAGS register.
> This allows the compiler to determine the best way to generate this
> code, which can improve code generation.
>
> This issue arose due to Clang's issue with the "=rm" constraint.  Clang
> chooses to be conservative in these situations, and so uses memory
> instead of registers. This is a known issue, which is currently being
> addressed.
>
> However, using builtins is benefiical in general, because it removes the

s/benefiical/beneficial/

> burden of determining what's the way to read the flags register from the
> programmer and places it on to the compiler, which has the information
> needed to make that decision. Indeed, this piece of code has had several
> changes over the years, some of which were pinging back and forth to
> determine the correct constraints to use.
>
> With this change, Clang generates better code:
>
> Original code:
>         movq    $0, -48(%rbp)
>         #APP
>         # __raw_save_flags
>         pushfq
>         popq    -48(%rbp)
>         #NO_APP
>         movq    -48(%rbp), %rbx
>
> New code:
>         pushfq
>         popq    %rbx
>         #APP

But it also forces frame pointers due to another bug in LLVM.
https://godbolt.org/z/6badWaGjo
For x86_64, we default to CONFIG_UNWINDER_ORC=y, not
CONFIG_UNWINDER_FRAME_POINTER=y.  So this change would make us use
registers instead of stack slots (improvement), but then force frame
pointers when we probably didn't need or want them (deterioration) for
all released versions of clang.

I think we should fix https://reviews.llvm.org/D92695 first before I'd
be comfortable signing off on this kernel change.  Again, I think we
should test out Phoebe's recommendation
https://reviews.llvm.org/D92695#inline-1086936
or do you already have a fix that I haven't yet been cc'ed on, perhaps?

>
> Note that the stack slot in the original code is no longer needed in the
> new code, saving a small amount of stack space.
>
> There is no change to GCC's ouput:
>
> Original code:
>
>         # __raw_save_flags
>         pushf ; pop %r13        # flags
>
> New code:
>
>         pushfq
>         popq    %r13    # _23
>
> Signed-off-by: Bill Wendling <morbo@...gle.com>
> ---
> v3: - Add blurb indicating that GCC's output hasn't changed.
> v2: - Kept the original function to retain the out-of-line symbol.
>     - Improved the commit message.
>     - Note that I couldn't use Nick's suggestion of
>
>         return IS_ENABLED(CONFIG_X86_64) ? ...
>
>       because Clang complains about using __builtin_ia32_readeflags_u32 in
>       64-bit mode.
> ---
>  arch/x86/include/asm/irqflags.h | 19 +++++--------------
>  1 file changed, 5 insertions(+), 14 deletions(-)
>
> diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
> index 87761396e8cc0..f31a035f3c6a9 100644
> --- a/arch/x86/include/asm/irqflags.h
> +++ b/arch/x86/include/asm/irqflags.h
> @@ -19,20 +19,11 @@
>  extern inline unsigned long native_save_fl(void);
>  extern __always_inline unsigned long native_save_fl(void)
>  {
> -       unsigned long flags;
> -
> -       /*
> -        * "=rm" is safe here, because "pop" adjusts the stack before
> -        * it evaluates its effective address -- this is part of the
> -        * documented behavior of the "pop" instruction.
> -        */
> -       asm volatile("# __raw_save_flags\n\t"
> -                    "pushf ; pop %0"
> -                    : "=rm" (flags)
> -                    : /* no input */
> -                    : "memory");
> -
> -       return flags;
> +#ifdef CONFIG_X86_64
> +       return __builtin_ia32_readeflags_u64();
> +#else
> +       return __builtin_ia32_readeflags_u32();
> +#endif
>  }
>
>  static __always_inline void native_irq_disable(void)
> --
> 2.35.0.263.gb82422642f-goog
>


-- 
Thanks,
~Nick Desaulniers

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ