[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAFULd4ZzoW+vP_pa1hEF--gvsG8yaPLU8S7oBkJBZLP4Tirepw@mail.gmail.com>
Date: Sat, 29 Mar 2025 09:48:14 +0100
From: Uros Bizjak <ubizjak@...il.com>
To: Ingo Molnar <mingo@...nel.org>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org,
Thomas Gleixner <tglx@...utronix.de>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...ux.intel.com>, "H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH 2/2] x86/bitops: Fix false output register dependency of
TZCNT insn
On Fri, Mar 28, 2025 at 11:28 PM Ingo Molnar <mingo@...nel.org> wrote:
>
>
> * Uros Bizjak <ubizjak@...il.com> wrote:
>
> > On Tue, Mar 25, 2025 at 10:43 PM Ingo Molnar <mingo@...nel.org> wrote:
> > >
> > >
> > > * Uros Bizjak <ubizjak@...il.com> wrote:
> > >
> > > > On Haswell and later Intel processors, the TZCNT instruction appears
> > > > to have a false dependency on the destination register. Even though
> > > > the instruction only writes to it, the instruction will wait until
> > > > destination is ready before executing. This false dependency
> > > > was fixed for Skylake (and later) processors.
> > > >
> > > > Fix false dependency by clearing the destination register first.
> > > >
> > > > The x86_64 defconfig object size increases by 4215 bytes:
> > > >
> > > > text data bss dec hex filename
> > > > 27342396 4642999 814852 32800247 1f47df7 vmlinux-old.o
> > > > 27346611 4643015 814852 32804478 1f48e7e vmlinux-new.o
> > >
> > > Yeah, so Skylake was released in 2015, about a decade ago.
> > >
> > > So we'd be making the kernel larger for an unquantified
> > > micro-optimization for CPUs that almost nobody uses anymore.
> > > That's a bad trade-off.
> >
> > Yes, 4.2k seems a bit excessive. OTOH, I'd not say that the issue is
> > a micro-optimization, it is bordering on the hardware bug.
>
> Has this been quantified, and do we really care about the
> micro-performance of ~10-year old CPUs, especially at the
> expense of modern CPUs?
No, although the change would be a one liner now. Without specially
crafted benchmark loops the impact is not noticeable and typical
kernel usage of these instructions is not that sensitive on
destination.
Thanks,
Uros.
Powered by blists - more mailing lists