lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <81ed8b53-1a40-4777-ab87-4f4abe032dbc@citrix.com>
Date: Tue, 29 Apr 2025 20:13:37 +0100
From: Andrew Cooper <andrew.cooper3@...rix.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: "H. Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...nel.org>,
 Arnd Bergmann <arnd@...db.de>, Arnd Bergmann <arnd@...nel.org>,
 Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
 Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>,
 x86@...nel.org, Juergen Gross <jgross@...e.com>,
 Boris Ostrovsky <boris.ostrovsky@...cle.com>,
 Alexander Usyskin <alexander.usyskin@...el.com>,
 Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
 Mateusz Jończyk <mat.jonczyk@...pl>,
 Mike Rapoport <rppt@...nel.org>, Ard Biesheuvel <ardb@...nel.org>,
 Peter Zijlstra <peterz@...radead.org>, linux-kernel@...r.kernel.org,
 xen-devel@...ts.xenproject.org
Subject: Re: [PATCH] bitops/32: Convert variable_ffs() and fls() zero-case
 handling to C

On 29/04/2025 7:05 pm, Linus Torvalds wrote:
> On Tue, 29 Apr 2025 at 07:38, Andrew Cooper <andrew.cooper3@...rix.com> wrote:
>> I tried that.  (The thread started as a question around
>> __builtin_constant_p() but did grow to cover __builtin_ffs().)
> Maybe we could do something like
>
>    #define ffs(x) \
>         (statically_true((x) != 0) ? __ffs(x)+1 : __builtin_ffs(x))
>
> which uses our "statically_true()" helper that is actually fairly good
> at the whole "let the compiler tell us that it knows that value cannot
> be zero"
>
> I didn't check what code that generated, but I've seen gcc do well on
> that statically_true() thing in the past.
>
> Then we can just remove our current variable_ffs() thing entirely,
> because we now depend on our (good) __ffs() and the builtin being
> "good enough" for the bad case.

That would improve code generation for 32bit, but generally regress 64bit.

Preloading the destination register with -1 is better than the CMOV form
emitted by the builtin; BSF's habit of conditionally not writing the
destination register *is* a CMOV of sorts.


When I cleaned this up in Xen, there were several factors where I
thought improvements could be made.

Having both ffs() and __ffs(), where the latter is undefined in a common
case, is a trap waiting for an unwary programmer.  I have no particular
love for ffs() being off-by-one from normal, but is well defined for all
inputs.

Also, leaving the constant folding to the arch-optimised form means that
it often gets forgotten.  Therefore, I rearranged everything to have
this be common:

static always_inline attr_const unsigned int ffs(unsigned int x)
{
    if ( __builtin_constant_p(x) )
        return __builtin_ffs(x);

#ifdef arch_ffs
    return arch_ffs(x);
#else
    return generic_ffsl(x);
#endif
}

with most architectures implementing arch_ffs as:

#define arch_ffs(x) ((x) ? 1 + __builtin_ctz(x) : 0)

and x86 as:

static always_inline unsigned int arch_ffs(unsigned int x)
{
    unsigned int r;

    if ( __builtin_constant_p(x > 0) && x > 0 )
    {
        /*
         * A common code pattern is:
         *
         *     while ( bits )
         *     {
         *         bit = ffs(bits);
         *         ...
         *
         * and the optimiser really can work with the knowledge of x being
         * non-zero without knowing it's exact value, in which case we don't
         * need to compensate for BSF's corner cases.  Otherwise...
         */
        asm ( "bsf %[val], %[res]"
              : [res] "=r" (r)
              : [val] "rm" (x) );
    }
    else
    {
        /*
         * ... the AMD manual states that BSF won't modify the destination
         * register if x=0.  The Intel manual states that the result is
         * undefined, but the architects have said that the register is
         * written back with it's old value (zero extended as normal).
         */
        asm ( "bsf %[val], %[res]"
              : [res] "=r" (r)
              : [val] "rm" (x), "[res]" (-1) );
    }

    return r + 1;
}
#define arch_ffs arch_ffs

and finally, providing compatibility for the other forms as:

#define __ffs(x) (ffs(x) - 1)


The end result is fewer APIs to implement in arch-specific code, and the
removal of undefined behaviour.

That said, I don't envy anyone wanting to try and untangle this in
Linux, even if consensus were to agree on it as an approach.

~Andrew

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ