lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 12 May 2022 09:03:38 +0900
From:   Vincent Mailhol <mailhol.vincent@...adoo.fr>
To:     Nick Desaulniers <ndesaulniers@...gle.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>
Cc:     Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
        "H . Peter Anvin" <hpa@...or.com>,
        Nathan Chancellor <nathan@...nel.org>,
        Tom Rix <trix@...hat.com>, linux-kernel@...r.kernel.org,
        llvm@...ts.linux.dev, David Howells <dhowells@...hat.com>,
        Jan Beulich <JBeulich@...e.com>,
        Vincent Mailhol <mailhol.vincent@...adoo.fr>
Subject: [PATCH v3 0/2] x86/asm/bitops: optimize ff{s,z} functions for constant expressions

The compilers provide some builtin expression equivalent to the ffs(),
__ffs() and ffz() function of the kernel. The kernel uses optimized
assembly which produces better code than the builtin
functions. However, such assembly code can not be optimized when used
on constant expression.

This series relies on __builtin_constant_p to select the optimal solution:

  * use kernel assembly for non constant expressions

  * use compiler's __builtin function for constant expressions.

I also think that the fls() and fls64() can be optimized in a similar
way, using __builtin_ctz() and __builtin_ctzll() but it is a bit less
trivial so I want to focus on this series first. If it get accepted, I
will then work on those two additionnal function.


** Statistics **

Patch 1/2 optimizes 26.7% of ffs() calls and patch 2/2 optimizes 27.9%
of __ffs() and ffz() calls (details of the calculation in each patch).


** Changelog **

v2 -> v3:

  * Redacted out the instructions after ret and before next function
    in the assembly output.

  * Added a note and a link to Nick's message on the constant
    propagation missed-optimization in clang:
    https://lore.kernel.org/all/CAKwvOdnH_gYv4qRN9pKY7jNTQK95xNeH1w1KZJJmvCkh8xJLBg@mail.gmail.com/

  * Fix copy/paste type in statistics of patch 1. Number of occurences
    before patches are 1081 and not 3607 (percentage reduction of
    26.7% remains correct)

  * Rename the functions as follow:
    - __varible_ffs() -> variable___ffs()
    - __variable_ffz() -> variable_ffz()

v1 -> v2:

  * Use the ORC unwinder for the produced assembly code in patch 1.

  * Rename the functions as follow:
     - __ffs_asm() -> variable_ffs()
     - __ffs_asm_not_zero() -> __variable_ffs()
     - ffz_asm() -> variable_ffs()

  * fit #define ffs(x) in a single line.

  * Correct the statistics for ffs() in patch 1 and add the statistics
    for __ffs() and ffz() in patch 2.

Vincent Mailhol (2):
  x86/asm/bitops: ffs: use __builtin_ffs to evaluate constant
    expressions
  x86/asm/bitops: __ffs,ffz: use __builtin_ctzl to evaluate constant
    expressions

 arch/x86/include/asm/bitops.h | 64 +++++++++++++++++++++--------------
 1 file changed, 38 insertions(+), 26 deletions(-)

-- 
2.35.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ