[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250602102825-42aa84f0-23f1-4d10-89fc-e8bbaffd291a@linutronix.de>
Date: Mon, 2 Jun 2025 10:29:30 +0200
From: Thomas Weißschuh <thomas.weissschuh@...utronix.de>
To: Nathan Chancellor <nathan@...nel.org>
Cc: Nick Desaulniers <nick.desaulniers+lkml@...il.com>,
Bill Wendling <morbo@...gle.com>, Justin Stitt <justinstitt@...gle.com>, llvm@...ts.linux.dev,
linux-kernel@...r.kernel.org
Subject: [BUG?] clang miscompilation of inline ASM with overlapping
input/output registers
Hi,
I observed a surprising behavior of clang around inline assembly and register
variables, differing from GCC.
Consider the following snippet:
$ cat repro.c
int main(void)
{
register long in asm("eax");
register long out asm("eax");
in = 0;
asm volatile("nop" : "+r" (out) : "r" (in));
return out;
}
The relevant part is that the inline ASM has input and output register
variables both using the same register and the input one is assigned to.
Compile with clang (19.1.7, tested on godbolt.org with trunk):
$ clang -O2 repro.c
$ llvm-objdump --disassemble-symbols=main a.out
0000000000001120 <main>:
1120: 90 nop
1121: c3 retq
The store of the variable "in" has been optimized away.
Compile with gcc (15.1.1, also tested on godbolt.org with trunk):
$ gcc -O2 repro.c
$ llvm-objdump --disassemble-symbols=main a.out
0000000000001020 <main>:
1020: 31 c0 xorl %eax, %eax
1022: 90 nop
1023: c3 retq
1024: 66 2e 0f 1f 84 00 00 00 00 00 nopw %cs:(%rax,%rax)
102e: 66 90 nop
The store to "eax" is preserved.
As far as I can see gcc is correct here. As the variable is used as an input to
ASM the compiler can not optimize away.
On other architectures the same effect can be observed.
The real kernel example for this issue is in the loongarch vDSO code from
arch/loongarch/include/asm/vdso/gettimeofday.h:
static __always_inline long clock_gettime_fallback(
clockid_t _clkid,
struct __kernel_timespec *_ts)
{
register clockid_t clkid asm("a0") = _clkid;
register struct __kernel_timespec *ts asm("a1") = _ts;
register long nr asm("a7") = __NR_clock_gettime;
register long ret asm("a0");
asm volatile(
" syscall 0\n"
: "+r" (ret)
: "r" (nr), "r" (clkid), "r" (ts)
: "$t0", "$t1", "$t2", "$t3", "$t4", "$t5", "$t6", "$t7",
"$t8", "memory");
return ret;
}
Here both "clkid" and "ret" are stored in "a0". I can't point to the concrete
disassembly here because it is inlined into a much larger block of code
and removing the inlining hides the bug.
Also in my tests the bug only manifests for "_clkid" in the interval [16, 23].
Other values work by chance.
Removing the aliasing by dropping "ret" and using "clkid" for both input and
output produces correct results.
Is this a clang bug, is the code broken or am I missing something?
Thomas
Powered by blists - more mailing lists