lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250602102825-42aa84f0-23f1-4d10-89fc-e8bbaffd291a@linutronix.de>
Date: Mon, 2 Jun 2025 10:29:30 +0200
From: Thomas Weißschuh <thomas.weissschuh@...utronix.de>
To: Nathan Chancellor <nathan@...nel.org>
Cc: Nick Desaulniers <nick.desaulniers+lkml@...il.com>, 
	Bill Wendling <morbo@...gle.com>, Justin Stitt <justinstitt@...gle.com>, llvm@...ts.linux.dev, 
	linux-kernel@...r.kernel.org
Subject: [BUG?] clang miscompilation of inline ASM with overlapping
 input/output registers


Hi,

I observed a surprising behavior of clang around inline assembly and register
variables, differing from GCC.

Consider the following snippet:

	$ cat repro.c
	int main(void)
	{
		register long in asm("eax");
		register long out asm("eax");

		in = 0;
		asm volatile("nop" : "+r" (out) : "r" (in));

		return out;
	}

The relevant part is that the inline ASM has input and output register
variables both using the same register and the input one is assigned to.


Compile with clang (19.1.7, tested on godbolt.org with trunk):

	$ clang -O2 repro.c
	$ llvm-objdump --disassemble-symbols=main a.out
	0000000000001120 <main>:
	    1120: 90                           	nop
	    1121: c3                           	retq

The store of the variable "in" has been optimized away.


Compile with gcc (15.1.1, also tested on godbolt.org with trunk):

	$ gcc -O2 repro.c
	$ llvm-objdump --disassemble-symbols=main a.out
	0000000000001020 <main>:
	    1020: 31 c0                        	xorl	%eax, %eax
	    1022: 90                           	nop
	    1023: c3                           	retq
	    1024: 66 2e 0f 1f 84 00 00 00 00 00	nopw	%cs:(%rax,%rax)
	    102e: 66 90                        	nop

The store to "eax" is preserved.


As far as I can see gcc is correct here. As the variable is used as an input to
ASM the compiler can not optimize away.
On other architectures the same effect can be observed.


The real kernel example for this issue is in the loongarch vDSO code from
arch/loongarch/include/asm/vdso/gettimeofday.h:

	static __always_inline long clock_gettime_fallback(
						clockid_t _clkid,
						struct __kernel_timespec *_ts)
	{
		register clockid_t clkid asm("a0") = _clkid;
		register struct __kernel_timespec *ts asm("a1") = _ts;
		register long nr asm("a7") = __NR_clock_gettime;
		register long ret asm("a0");

		asm volatile(
		"       syscall 0\n"
		: "+r" (ret)
		: "r" (nr), "r" (clkid), "r" (ts)
		: "$t0", "$t1", "$t2", "$t3", "$t4", "$t5", "$t6", "$t7",
		 "$t8", "memory");

		return ret;
	}

Here both "clkid" and "ret" are stored in "a0". I can't point to the concrete
disassembly here because it is inlined into a much larger block of code
and removing the inlining hides the bug.
Also in my tests the bug only manifests for "_clkid" in the interval [16, 23].
Other values work by chance.
Removing the aliasing by dropping "ret" and using "clkid" for both input and
output produces correct results.

Is this a clang bug, is the code broken or am I missing something?


Thomas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ