lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	24 Sep 2012 07:46:02 -0400
From:	"George Spelvin" <linux@...izon.com>
To:	linux@...izon.com, mpn@...gle.com, vda.linux@...glemail.com
Cc:	hughd@...gle.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 3/4] lib: vsprintf: Optimize put_dec_trunc8

>>  lib/vsprintf.c |   20 ++++++--------------
>>  1 file changed, 6 insertions(+), 14 deletions(-)
>>
>> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
>> index a8e7392..3ca77b8 100644
>> --- a/lib/vsprintf.c
>> +++ b/lib/vsprintf.c
>> @@ -174,20 +174,12 @@ char *put_dec_trunc8(char *buf, unsigned r)
>>  	unsigned q;
>>=20=20
>>  	/* Copy of previous function's body with added early returns */
>> -	q      = (r * (uint64_t)0x1999999a) >> 32;
>> -	*buf++ = (r - 10 * q) + '0'; /* 2 */
>> -	if (q == 0)
>> -		return buf;
>> -	r      = (q * (uint64_t)0x1999999a) >> 32;
>> -	*buf++ = (q - 10 * r) + '0'; /* 3 */
>> -	if (r == 0)
>> -		return buf;
>> -	q      = (r * (uint64_t)0x1999999a) >> 32;
>> -	*buf++ = (r - 10 * q) + '0'; /* 4 */
>> -	if (q == 0)
>> -		return buf;
>> -	r      = (q * (uint64_t)0x1999999a) >> 32;
>> -	*buf++ = (q - 10 * r) + '0'; /* 5 */
>> +	while (r >= 10000) {
>> +		q = r + '0';
>> +		r  = (r * (uint64_t)0x1999999a) >> 32;
>> +		*buf++ = q - 10*r;
>> +	}

> This loop looks nothing like the original code.  Why are you adding '0'
> at the beginning?

Because I was trying to avoid useless bare "move" instructions
when converting to a non-swapping loop, and putting it up
there made clear that the copy could be combined with a
useful add on a 3-operand machine.  (Or even a 2-operand with
lea.)

Compilers are probably smart enough to figure that out by themselves,
but it mirrors my thinking as I was writing the code.

While it is a bit subtle, it's only three lines of code, and
I figured the equivalence to
"q = r; r = <math>; +buf++ = q - 10*r + '0';"
was pretty easy to see.


> Also, the original code switches the role of q and r,
> the loop does not.

Well, obviously; I had to do that to make it possible to put
into a loop.

Truthfully, it would have made *more* sense to swap q and r globally,
so the loop had a more sensible q=quotient/r=remainder assignment,
but I wanted to show that the unmodified tail was in fact unmodified.

The big saving from using a loop is that it avoids unnecessary
32x32->64-bit multiplies, falling through to the 16x16->32-bit
code as early as possible.  Given that most numbers are small,
this seemed like a significant win.

(As long as it doesn't add additional unpredictable conditional
branches, which are also expensive.)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ