lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFULd4agDYNCJrQQ8bji09eGVJp1D46LH=fOpcfciza2qhKJfg@mail.gmail.com>
Date: Wed, 26 Mar 2025 09:46:15 +0100
From: Uros Bizjak <ubizjak@...il.com>
To: Ingo Molnar <mingo@...nel.org>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org, 
	Thomas Gleixner <tglx@...utronix.de>, Borislav Petkov <bp@...en8.de>, 
	Dave Hansen <dave.hansen@...ux.intel.com>, "H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH 2/2] x86/bitops: Fix false output register dependency of
 TZCNT insn

On Tue, Mar 25, 2025 at 10:43 PM Ingo Molnar <mingo@...nel.org> wrote:
>
>
> * Uros Bizjak <ubizjak@...il.com> wrote:
>
> > On Haswell and later Intel processors, the TZCNT instruction appears
> > to have a false dependency on the destination register. Even though
> > the instruction only writes to it, the instruction will wait until
> > destination is ready before executing. This false dependency
> > was fixed for Skylake (and later) processors.
> >
> > Fix false dependency by clearing the destination register first.
> >
> > The x86_64 defconfig object size increases by 4215 bytes:
> >
> >           text           data     bss      dec            hex filename
> >       27342396        4642999  814852 32800247        1f47df7 vmlinux-old.o
> >       27346611        4643015  814852 32804478        1f48e7e vmlinux-new.o
>
> Yeah, so Skylake was released in 2015, about a decade ago.
>
> So we'd be making the kernel larger for an unquantified
> micro-optimization for CPUs that almost nobody uses anymore.
> That's a bad trade-off.

Yes, 4.2k seems a bit excessive. OTOH, I'd not say that the issue is a
micro-optimization, it is bordering on the hardware bug.

An alternative (also answering Peter's question) would be to make
instruction dependent only on the input (so, matching input and output
operand, making insn destructive), requiring MOV if the input operand
is to be reused. A quick test (x86_64 defconfig) with the attached
patch shows much less impact:

  text    data     bss     dec     hex filename
27564120        4642423  814852 33021395        1f7ddd3 vmlinux-old.o
27564679        4642423  814852 33021954        1f7e002 vmlinux-new.o

The increase is now 559 bytes, which IMHO is an acceptable trade-off.
Please also note that:

       asm("tzcnt %1,%0"
               : "=r" (word)
               : "r" (~word));
       return word;

emits NOT insn that is also destructive and already requires copying
of the input operand if the operand is to be reused.

Thanks,
Uros.

View attachment "p.diff.txt" of type "text/plain" (699 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ