[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAK1hOcNViVbPH-p54vDaxDt8TKg27qfa=Z2md-EJ0K5qsLse=Q@mail.gmail.com>
Date: Mon, 24 Sep 2012 17:02:15 +0200
From: Denys Vlasenko <vda.linux@...glemail.com>
To: George Spelvin <linux@...izon.com>
Cc: hughd@...gle.com, linux-kernel@...r.kernel.org, mina86@...a86.com
Subject: Re: [PATCH 2/4] lib: vsprintf: Optimize division by 10000
On Mon, Sep 24, 2012 at 2:35 PM, George Spelvin <linux@...izon.com> wrote:
>> Here is the comparison of the x86-32 assembly
>> of the fragment which does "x / 10000" thing,
>> before and after the patch:
>
>> -01 c6 add %eax,%esi
>> -b8 59 17 b7 d1 mov $0xd1b71759,%eax
>> -f7 e6 mul %esi
>> -89 d3 mov %edx,%ebx
>> -89 f2 mov %esi,%edx
>> -c1 eb 0d shr $0xd,%ebx
>>
>> +01 c7 add %eax,%edi
>> +b8 d7 c5 6d 34 mov $0x346dc5d7,%eax
>> +f7 e7 mul %edi
>> +89 55 e8 mov %edx,-0x18(%ebp)
>> +8b 5d e8 mov -0x18(%ebp),%ebx
>> +89 fa mov %edi,%edx
>> +89 45 e4 mov %eax,-0x1c(%ebp)
>> +c1 eb 0b shr $0xb,%ebx
>>
>> Poor gcc got confused, and generated somewhat
>> worse code (spilling and immediately reloading upper
>> part of 32x32->64 multiply).
>
>> Please test and benchmark your changes to this code
>> before submitting them.
>
> Thanks for the feedback! It very much *was* intended to start a
> conversation with you, but the 7 week response delay somewhat interfered
> with that process.
>
> I was playing with it on ARM, where the results are a bit different.
>
> As you can see, it fell out of some other word which *did* make a
> useful difference. I just hadn't tested this change in isolation,
Please find attached source of test program.
You need to touch test_new.c (or make a few copies of it
to experiment with different versions of code). test_header.c
and test_main.c contain benchmarking code and need not be modified.
It also includes a verification step, which would catch the bugs
you had in your patches.
Usage:
$ gcc [ --static] [-m32] -O2 -Wall test_new.c -otest_new
$ ./test_new
Conversions per second: 8:59964000 123:48272000 123456:37852000
12345678:34216000 123456789:23528000 2^32:23520000 2^64:17616000
Conversions per second: 8:60092000 123:48536000 123456:37836000
12345678:33924000 123456789:23580000 2^32:23372000 2^64:17608000
Conversions per second: 8:60084000 123:48396000 123456:37840000
12345678:34192000 123456789:23564000 2^32:23484000 2^64:17612000
Conversions per second: 8:60108000 123:48500000 123456:37872000
12345678:33996000 123456789:23576000 2^32:23524000 2^64:17612000
Tested 14680064 ^C
^^^^^^^^^^^^^^^^^^^^^^^^ tests correctness until user interrupts it
$
View attachment "test_header.c" of type "text/x-csrc" (1708 bytes)
View attachment "test_main.c" of type "text/x-csrc" (1766 bytes)
View attachment "test_new.c" of type "text/x-csrc" (10005 bytes)
Powered by blists - more mailing lists