lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFULd4b=5Gh1YxJtq3bWKZFf_jCRrSFv5TurGDEP3ymCfjpPkQ@mail.gmail.com>
Date: Sun, 30 Mar 2025 08:54:13 +0200
From: Uros Bizjak <ubizjak@...il.com>
To: "H. Peter Anvin" <hpa@...or.com>
Cc: Ingo Molnar <mingo@...nel.org>, x86@...nel.org, linux-kernel@...r.kernel.org, 
	Thomas Gleixner <tglx@...utronix.de>, Borislav Petkov <bp@...en8.de>, 
	Dave Hansen <dave.hansen@...ux.intel.com>
Subject: Re: [PATCH -tip 2/2] x86/hweight: Use POPCNT when available with
 X86_NATIVE_CPU option

On Sun, Mar 30, 2025 at 12:10 AM H. Peter Anvin <hpa@...or.com> wrote:
>
> On March 29, 2025 2:19:37 AM PDT, Uros Bizjak <ubizjak@...il.com> wrote:
> >
> >The proposed solution builds on the fact that with -march=native (and
> >also when -mpopcnt is specified on the command line) , the compiler
> >signals the availability of certain ISA by defining the corresponding
> >definition. We can use this definition to relax the constraints to fit
> >the instruction, not the ABI of the fallback function call. On x86, we
> >can also access memory directly, avoiding clobbering a temporary input
> >register.
> >
> >Without the fix for (obsolete) false dependency, the change becomes simply:
> >
> >#ifdef __POPCNT__
> >     asm ("popcntl %[val], %[cnt]"
> >                 : [cnt] "=r" (res)
> >                 : [val] ASM_INPUT_RM (w));
> >#else
> >
> >and besides the reported savings of 600 bytes in the .text section
> >also allows the register allocator to schedule registers (and input
> >arguments from memory) more optimally, not counting additional 9k
> >saved space in the alternative section.
> >
> >The patch is also an example, how -march=native enables further
> >optimizations involving additional ISAs.
>
> If you have __POPCNT__ defined, could you not simply use __builtin_popcnt()?

We can use it, but then the compiler (at least GCC) will start to emit
false dependency fixups for POPCNT instructions (which we don't want;
TZCNT has the same problem, where we agreed that it is not worth
fixing for 10 years old cpus [1]).

Please note that __builtin functions are not strictly required to be
inlined and can generate a library call [2]. I have been burned by
this fact with __builtin_parity(), so IMO the safest way in case of
POPCNT insn is to use ISA macros only to determine the availability of
ISA.

Also note that optimizations with constant arguments are already
performed elsewhere (c.f. <asm-generic/bitops/const_hweight.h>) and
__arch_hweight() receives only variable arguments, so optimization
opportunities with __builtin function are greatly reduced.

[1] https://lore.kernel.org/lkml/CAFULd4ZzoW+vP_pa1hEF--gvsG8yaPLU8S7oBkJBZLP4Tirepw@mail.gmail.com/
[2] https://lore.kernel.org/linux-mm/CAKbZUD0N7bkuw_Le3Pr9o1V2BjjcY_YiLm8a8DPceubTdZ00GQ@mail.gmail.com/

Uros.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ