lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAK1hOcNViVbPH-p54vDaxDt8TKg27qfa=Z2md-EJ0K5qsLse=Q@mail.gmail.com>
Date:	Mon, 24 Sep 2012 17:02:15 +0200
From:	Denys Vlasenko <vda.linux@...glemail.com>
To:	George Spelvin <linux@...izon.com>
Cc:	hughd@...gle.com, linux-kernel@...r.kernel.org, mina86@...a86.com
Subject: Re: [PATCH 2/4] lib: vsprintf: Optimize division by 10000

On Mon, Sep 24, 2012 at 2:35 PM, George Spelvin <linux@...izon.com> wrote:
>> Here is the comparison of the x86-32 assembly
>> of the fragment which does "x / 10000" thing,
>> before and after the patch:
>
>> -01 c6                  add    %eax,%esi
>> -b8 59 17 b7 d1         mov    $0xd1b71759,%eax
>> -f7 e6                  mul    %esi
>> -89 d3                  mov    %edx,%ebx
>> -89 f2                  mov    %esi,%edx
>> -c1 eb 0d               shr    $0xd,%ebx
>>
>> +01 c7                  add    %eax,%edi
>> +b8 d7 c5 6d 34         mov    $0x346dc5d7,%eax
>> +f7 e7                  mul    %edi
>> +89 55 e8               mov    %edx,-0x18(%ebp)
>> +8b 5d e8               mov    -0x18(%ebp),%ebx
>> +89 fa                  mov    %edi,%edx
>> +89 45 e4               mov    %eax,-0x1c(%ebp)
>> +c1 eb 0b               shr    $0xb,%ebx
>>
>> Poor gcc got confused, and generated somewhat
>> worse code (spilling and immediately reloading upper
>> part of 32x32->64 multiply).
>
>> Please test and benchmark your changes to this code
>> before submitting them.
>
> Thanks for the feedback!  It very much *was* intended to start a
> conversation with you, but the 7 week response delay somewhat interfered
> with that process.
>
> I was playing with it on ARM, where the results are a bit different.
>
> As you can see, it fell out of some other word which *did* make a
> useful difference.  I just hadn't tested this change in isolation,

Please find attached source of test program.

You need to touch test_new.c (or make a few copies of it
to experiment with different versions of code). test_header.c
and test_main.c contain benchmarking code and need not be modified.

It also includes a verification step, which would catch the bugs
you had in your patches.

Usage:

$ gcc [ --static] [-m32] -O2 -Wall test_new.c -otest_new
$ ./test_new
Conversions per second: 8:59964000 123:48272000 123456:37852000
12345678:34216000 123456789:23528000 2^32:23520000 2^64:17616000
Conversions per second: 8:60092000 123:48536000 123456:37836000
12345678:33924000 123456789:23580000 2^32:23372000 2^64:17608000
Conversions per second: 8:60084000 123:48396000 123456:37840000
12345678:34192000 123456789:23564000 2^32:23484000 2^64:17612000
Conversions per second: 8:60108000 123:48500000 123456:37872000
12345678:33996000 123456789:23576000 2^32:23524000 2^64:17612000
Tested 14680064      ^C
^^^^^^^^^^^^^^^^^^^^^^^^ tests correctness until user interrupts it
$

View attachment "test_header.c" of type "text/x-csrc" (1708 bytes)

View attachment "test_main.c" of type "text/x-csrc" (1766 bytes)

View attachment "test_new.c" of type "text/x-csrc" (10005 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ