lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 24 Sep 2012 11:03:57 +0200
From:	Denys Vlasenko <vda.linux@...glemail.com>
To:	George Spelvin <linux@...izon.com>
Cc:	mina86@...a86.com, hughd@...gle.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/4] lib: vsprintf: Optimize division by 10000

On Fri, Aug 3, 2012 at 7:21 AM, George Spelvin <linux@...izon.com> wrote:
> The same multiply-by-inverse technique can be used to
> convert division by 10000 to a 32x32->64-bit multiply.
>
> Signed-off-by: George Spelvin <linux@...izon.com>
> ---
>  lib/vsprintf.c |   60 +++++++++++++++++++++++++++++++-------------------------
>  1 file changed, 33 insertions(+), 27 deletions(-)
>
> This is something of an RFC, I haven't benchmarked the helper
> function.  But it sure cleans up the code!

Can you please do that before you send alterations
to the code which *was* benchmarked?


> +static
> +unsigned put_dec_helper4(char *buf, unsigned x)
> +{
> +        uint32_t q = (x * (uint64_t)0x346DC5D7) >> 43;
> +
> +        put_dec_full4(buf, x - q * 10000);
> +        return q;
>  }
>
>  /* Based on code by Douglas W. Jones found at
> @@ -277,28 +292,19 @@ char *put_dec(char *buf, unsigned long long n)
>         d3  = (h >> 16); /* implicit "& 0xffff" */
>
>         q   = 656 * d3 + 7296 * d2 + 5536 * d1 + ((uint32_t)n & 0xffff);
> +       q = put_dec_helper4(buf, q);
> +
> +       q += 7671 * d3 + 9496 * d2 + 6 * d1;
> +       q = put_dec_helper4(buf+4, q);
> +
> +       q += 4749 * d3 + 42 * d2;
> +       q = put_dec_helper4(buf+8, q);
>
> -       buf = put_dec_full4(buf, q % 10000);
> -       q   = q / 10000;
> -
> -       d1  = q + 7671 * d3 + 9496 * d2 + 6 * d1;
> -       buf = put_dec_full4(buf, d1 % 10000);
> -       q   = d1 / 10000;
> -
> -       d2  = q + 4749 * d3 + 42 * d2;
> -       buf = put_dec_full4(buf, d2 % 10000);
> -       q   = d2 / 10000;
> -
> -       d3  = q + 281 * d3;
> -       if (!d3)
> -               goto done;
> -       buf = put_dec_full4(buf, d3 % 10000);
> -       q   = d3 / 10000;
> -       if (!q)
> -               goto done;
> -       buf = put_dec_full4(buf, q);
> - done:
> -       while (buf[-1] == '0')
> +       q += 281 * d3;
> +       buf += 12;
> +       if (q)
> +               buf = put_dec_trunc8(buf, q);
> +       else while (buf[-1] == '0')
>                 --buf;

gcc was already doing that optimization for us.
It will use a larger constant and shift, but that
changes neither size nor speed.

Moreover, put_dec_helper4 will get inlined,
so the generated assembly code will be very similar.

Here is the comparison of the x86-32 assembly
of the fragment which does "x / 10000" thing,
before and after the patch:

-01 c6                  add    %eax,%esi
-b8 59 17 b7 d1         mov    $0xd1b71759,%eax
-f7 e6                  mul    %esi
-89 d3                  mov    %edx,%ebx
-89 f2                  mov    %esi,%edx
-c1 eb 0d               shr    $0xd,%ebx

+01 c7                  add    %eax,%edi
+b8 d7 c5 6d 34         mov    $0x346dc5d7,%eax
+f7 e7                  mul    %edi
+89 55 e8               mov    %edx,-0x18(%ebp)
+8b 5d e8               mov    -0x18(%ebp),%ebx
+89 fa                  mov    %edi,%edx
+89 45 e4               mov    %eax,-0x1c(%ebp)
+c1 eb 0b               shr    $0xb,%ebx

Poor gcc got confused, and generated somewhat
worse code (spilling and immediately reloading upper
part of 32x32->64 multiply).

Please test and benchmark your changes to this code
before submitting them.

-- 
vda
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists