lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <15D32E4A-E3AE-4AAB-A697-51C53B766F66@zytor.com>
Date: Thu, 22 Jan 2026 23:06:27 -0800
From: "H. Peter Anvin" <hpa@...or.com>
To: "Maciej W. Rozycki" <macro@...am.me.uk>
CC: David Desobry <david.desobry@...malgen.com>,
        David Laight <david.laight.linux@...il.com>, tglx@...nel.org,
        Ingo Molnar <mingo@...hat.com>, bp@...en8.de,
        dave.hansen@...ux.intel.com, x86@...nel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] x86/lib: Optimize num_digits() and fix INT_MIN overflow

On January 21, 2026 3:51:29 AM PST, "Maciej W. Rozycki" <macro@...am.me.uk> wrote:
>On Tue, 20 Jan 2026, H. Peter Anvin wrote:
>
>> Now, for really silly optimization:
>> 
>> int num_digits(unsigned int x)
>> {
>>     int n = 0;
>>     asm("cmp %2,%1; sbb $-2,%0" : "+r" (n) : "r" (x), "g" (10));
>>     asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (100));
>>     asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (1000));
>>     asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (10000));
>>     asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (100000));
>>     asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (1000000));
>>     asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (10000000));
>>     asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (100000000));
>>     asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (1000000000));
>> 
>>     return n;
>> }
>> 
>> No branches at all!
>
> I guess you chose to use SBB rather than somewhat less mind-twisting ADC 
>for the entertainment of the reader?
>
> Anyway branchless code can be produced from C code as well, e.g.:
>
>int num_digits(unsigned int x)
>{
>	return (1 + (x > 9) + (x > 99) + (x > 999) + (x > 9999) +
>		(x > 99999) + (x > 999999) + (x > 9999999) +
>		(x > 99999999) + (x > 999999999));
>}
>
>although GCC at least as at version 11 I have here uses SETA rather than 
>ADC/SBB (it doesn't care if you write (x > 9) or (x >= 10), etc.) emitting 
>a longer and likely slower sequence even at -Os.  And likewise the POWER 
>backend doesn't take advantage of the carry flag and prefers calculations 
>involving shifting the sign bit into bit 0.  Obviously no one must have 
>thought of adding the right transformation to the optimiser, which might 
>be an interesting challenge to someone.
>
>  Maciej

No, I use it because SBB subtracts CF, whereas ADC adds CF.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ