[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <15D32E4A-E3AE-4AAB-A697-51C53B766F66@zytor.com>
Date: Thu, 22 Jan 2026 23:06:27 -0800
From: "H. Peter Anvin" <hpa@...or.com>
To: "Maciej W. Rozycki" <macro@...am.me.uk>
CC: David Desobry <david.desobry@...malgen.com>,
David Laight <david.laight.linux@...il.com>, tglx@...nel.org,
Ingo Molnar <mingo@...hat.com>, bp@...en8.de,
dave.hansen@...ux.intel.com, x86@...nel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] x86/lib: Optimize num_digits() and fix INT_MIN overflow
On January 21, 2026 3:51:29 AM PST, "Maciej W. Rozycki" <macro@...am.me.uk> wrote:
>On Tue, 20 Jan 2026, H. Peter Anvin wrote:
>
>> Now, for really silly optimization:
>>
>> int num_digits(unsigned int x)
>> {
>> int n = 0;
>> asm("cmp %2,%1; sbb $-2,%0" : "+r" (n) : "r" (x), "g" (10));
>> asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (100));
>> asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (1000));
>> asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (10000));
>> asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (100000));
>> asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (1000000));
>> asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (10000000));
>> asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (100000000));
>> asm("cmp %2,%1; sbb $-1,%0" : "+r" (n) : "r" (x), "g" (1000000000));
>>
>> return n;
>> }
>>
>> No branches at all!
>
> I guess you chose to use SBB rather than somewhat less mind-twisting ADC
>for the entertainment of the reader?
>
> Anyway branchless code can be produced from C code as well, e.g.:
>
>int num_digits(unsigned int x)
>{
> return (1 + (x > 9) + (x > 99) + (x > 999) + (x > 9999) +
> (x > 99999) + (x > 999999) + (x > 9999999) +
> (x > 99999999) + (x > 999999999));
>}
>
>although GCC at least as at version 11 I have here uses SETA rather than
>ADC/SBB (it doesn't care if you write (x > 9) or (x >= 10), etc.) emitting
>a longer and likely slower sequence even at -Os. And likewise the POWER
>backend doesn't take advantage of the carry flag and prefers calculations
>involving shifting the sign bit into bit 0. Obviously no one must have
>thought of adding the right transformation to the optimiser, which might
>be an interesting challenge to someone.
>
> Maciej
No, I use it because SBB subtracts CF, whereas ADC adds CF.
Powered by blists - more mailing lists