[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wg=Wdct5f9W2-tvwfRefv3xmw1-9Ko+RG+6=xjLu4ndFg@mail.gmail.com>
Date: Mon, 8 Apr 2024 12:42:31 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Ingo Molnar <mingo@...nel.org>, Thomas Gleixner <tglx@...utronix.de>, Peter Anvin <hpa@...or.com>,
"the arch/x86 maintainers" <x86@...nel.org>, Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: More annoying code generation by clang
On Mon, 8 Apr 2024 at 11:32, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> It's been reported long ago, it seems to be hard to fix.
>
> I suspect the issue is that the inline asm format is fairly closely
> related to the gcc machine descriptions (look at the machine
> descriptor files in gcc, and if you can ignore the horrid LISP-style
> syntax you see how close they are).
Actually, one of the github issues pages has more of an explanation
(and yes, it's tied to impedance issues between the inline asm syntax
and how clang works):
https://github.com/llvm/llvm-project/issues/20571#issuecomment-980933442
so I wrote more of a commit log and did that "ASM_SOURCE_G" thing
(except I decided to call it "input" instead of "source", since that's
the standard inline asm language).
This version also has that output size fixed, and the commit message
talks about it.
This does *not* fix other inline asms to use "ASM_INPUT_G/RM".
I think it's mainly some of the bitop code that people have noticed
before - fls and variable_ffs() and friends.
I suspect clang is more common in the arm64 world than it is for
x86-64 kernel developers, and arm64 inline asm basically never uses
"rm" or "g" since arm64 doesn't have instructions that take either a
register or a memory operand.
Anyway, with gcc this generates
cmp (%rdx),%ebx; sbb %rax,%rax # _7->max_fds, fd, __mask
IOW, it uses the memory location for "max_fds". It couldn't do that
before, because it used to think that it always had to do the compare
in 64 bits, and the memory location is only 32-bit.
With clang, this generates
movl (%rcx), %eax
cmpl %eax, %edi
sbbq %rdi, %rdi
which has that extra register use, but is at least much better than
what it used to generate with crazy "load into register, spill to
stack, then compare against stack contents".
Linus
View attachment "0001-x86-improve-array_index_mask_nospec-code-generation.patch" of type "text/x-patch" (4554 bytes)
Powered by blists - more mailing lists