[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CALCETrXUaeNUbkQSeMPpPKWDBCEpqX1gLgkv2G9zLeeYMjK8VQ@mail.gmail.com>
Date: Mon, 14 Sep 2020 15:02:36 -0700
From: Andy Lutomirski <luto@...nel.org>
To: Nick Desaulniers <ndesaulniers@...gle.com>
Cc: Andy Lutomirski <luto@...nel.org>,
Bill Wendling <morbo@...gle.com>,
Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
"maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" <x86@...nel.org>,
Greg Thelen <gthelen@...gle.com>,
John Sperbeck <jsperbeck@...gle.com>,
LKML <linux-kernel@...r.kernel.org>,
clang-built-linux <clang-built-linux@...glegroups.com>
Subject: Re: [PATCH] x86: use clang builtins to read and write eflags
On Mon, Sep 14, 2020 at 2:05 PM Nick Desaulniers
<ndesaulniers@...gle.com> wrote:
>
> (Bill, please use `./scripts/get_maintainer.pl <patchfile>` to get the
> appropriate list of folks and mailing lists to CC)
>
> On Thu, Sep 3, 2020 at 8:06 AM Andy Lutomirski <luto@...nel.org> wrote:
> >
> > So with my selftests hat on, the inline asm utterly sucks. Trying to
> > use pushfq / popfq violates the redzone and requires a gross hack to
> > work around. It also messes up the DWARF annotations. The kernel
> > gets a free pass on both of these because we don't use a redzone and
> > we don't use DWARF.
>
> Sorry, I don't fully understand:
> 1. What's the redzone?
Userspace ABI. x86_64 userspace is allowed to use 128 bytes below RSP.
> 2. How does pushfq/popfq violate it?
It clobbers a stack slot owned by the compiler.
> 3. What is the "gross hack"/"workaround?" (So that we might consider
> removing it if these builtins help).
Look in tools/testing/selftests/x86/helpers.h. I have a patch to
switch to the builtins.
> 4. The kernel does use DWARF, based on configs, right?
Indeed, but user code in the kernel tree (e.g. selftests) does.
> #ifdef CONFIG_X86_64
> #define __read_eflags __builtin_ia32_readeflags_u64
> #define __write_eflags __builtin_i32_writeeflags_u64
> #else
> #define __read_eflags __builtin_ia32_readeflags_u32
> #define __write_eflags __builtin_i32_writeeflags_u32
> #endif
Looks reasonable to me.
>
> Which would be concise. For smap_save() we can use clac() and stac()
> which might be nicer than `asm goto` (kudos for using `asm goto`, but
> plz no more). :^P Also, we can probably cleanup the note about GCC <
> 4.9 now. :)
I find it a bit implausible that popfq is faster than a branch, so the
smap_restore() code seems suboptimal. For smap_save(), I'm not sure
what's ideal, but the current code seems fine other than the bogus
constraint.
> > > Clang chooses the most general constraint when multiple constraints
> > > are specified. So it will use the stack when retrieving the flags.
> >
> > I haven't looked at clang's generated code to see if it's sensible
> > from an optimization perspective, but surely the kernel code right now
> > is genuinely wrong. GCC is free to omit the frame pointer and
> > generate a stack-relative access for the pop, in which case it will
> > malfunction.
>
> Sorry, this is another case I don't precisely follow, would you mind
> explaining it further to me please?
The compiler is permitted (and expected!) to instantiate an m
constraint as something like offset(%rsp). For example, this:
unsigned long bogus_read_flags(void)
{
unsigned long ret;
asm volatile ("pushfq\n\tpopq %0" : "=m" (ret));
return ret;
}
compiles to:
pushfq
popq -8(%rsp)
movq -8(%rsp), %rax
ret
which is straight-up wrong. Changing it to "=rm" gives:
pushfq
popq %rax
ret
which is correct, but this only works because gcc politely chose the r
option instead of the m option. clang chooses poorly and gives:
read_flags:
pushfq
popq -8(%rsp)
movq -8(%rsp), %rax
retq
which is wrong. I filed a clang bug:
https://bugs.llvm.org/show_bug.cgi?id=47530
but the kernel is buggy here -- clang is within its rights to generate
the bogus sequence above. Bill's email was IMO rather obfuscated, and
I assume this is what he meant.
Of course, clang unhelpfully generates poor code for the builtin, too:
nice_read_eflags:
pushq %rbp
movq %rsp, %rbp
pushfq
popq %rax
popq %rbp
retq
I filed a bug for that, too:
https://bugs.llvm.org/show_bug.cgi?id=47531
So we at least need to fix the bogus "=rm", and we should seriously
consider using the builtin.
Powered by blists - more mailing lists