lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 16 Nov 2016 15:16:59 -0500
From:   Chris Metcalf <cmetcalf@...lanox.com>
To:     John Stultz <john.stultz@...aro.org>
CC:     Thomas Gleixner <tglx@...utronix.de>,
        Salman Qazi <sqazi@...gle.com>, "Paul Turner" <pjt@...gle.com>,
        Tony Lindgren <tony@...mide.com>,
        Steven Miao <realmz6@...il.com>,
        lkml <linux-kernel@...r.kernel.org>,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH v2] tile: avoid using clocksource_cyc2ns with absolute
 cycle count

On 11/16/2016 2:59 PM, John Stultz wrote:
> On Wed, Nov 16, 2016 at 11:35 AM, Chris Metcalf <cmetcalf@...lanox.com> wrote:
>> For large values of "mult" and long uptimes, the intermediate
>> result of "cycles * mult" can overflow 64 bits.  For example,
>> the tile platform calls clocksource_cyc2ns with a 1.2 GHz clock;
>> we have mult = 853, and after 208.5 days, we overflow 64 bits.
>>
>> Since clocksource_cyc2ns() is intended to be used for relative
>> cycle counts, not absolute cycle counts, performance is more
>> importance than accepting a wider range of cycle values.
>> So, just use mult_frac() directly in tile's sched_clock().
>>
>> Signed-off-by: Chris Metcalf <cmetcalf@...lanox.com>
>> ---
>> Blackfin should make a similar change in their sched_clock().
>>
>>   arch/tile/kernel/time.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/tile/kernel/time.c b/arch/tile/kernel/time.c
>> index 178989e6d3e3..ea960d660917 100644
>> --- a/arch/tile/kernel/time.c
>> +++ b/arch/tile/kernel/time.c
>> @@ -218,8 +218,8 @@ void do_timer_interrupt(struct pt_regs *regs, int fault_num)
>>    */
>>   unsigned long long sched_clock(void)
>>   {
>> -       return clocksource_cyc2ns(get_cycles(),
>> -                                 sched_clock_mult, SCHED_CLOCK_SHIFT);
>> +       return mult_frac(get_cycles(),
>> +                        sched_clock_mult, 1ULL << SCHED_CLOCK_SHIFT);
>>   }
> So... looking closer at mult_frac(), its a really slow implementation,
> doing 2 divs and a mod and a mult. Hopefully the compiler can sort out
> the divs are power of two, and optimize it out, but I'm still
> hesitant.
>
> sched_clock() is normally a very hot-path call, so this might have a
> real performance impact, especially compared to what its replacing.
>
> In your earlier patch, you mentioned this was similar to 4cecf6d401a0
> ("sched, x86: Avoid unnecessary overflow in
> sched_clock"). It might be better to actually try to use similar logic
> there, to make sure the performance impact is minimal.

This was the first thing I looked at when I saw the mult_frac()
implementation.  The modulus operations are indeed converted to
bitmasks and the divides to shifts.  We do have to do two multiplies
instead of one, but that's basically the worst of the cost.

Change 4cecf6d401a0 results in essentially identical code for x86 as
this proposed change does for tile.  In fact a follow-on change by
Salman introduced mult_frac() and switched to using it, so it was
identical at that point.

PeterZ (cc'ed) then improved it to use __int128 math via
mul_u64_u32_shr(), but that doesn't help tile; we only do one multiply
instead of two, but the multiply is handled by an out-of-line call to
__multi3, and the sched_clock() function ends up about 2.5x slower as
a result.

Thanks for thinking about this!

-- 
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ