linux-kernel - Re: [PATCH v2] tile: avoid using clocksource

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <16b10c48-7caf-8097-e9a0-adca64c57773@mellanox.com>
Date:   Fri, 18 Nov 2016 09:24:52 -0500
From:   Chris Metcalf <cmetcalf@...lanox.com>
To:     Peter Zijlstra <peterz@...radead.org>
CC:     John Stultz <john.stultz@...aro.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Salman Qazi <sqazi@...gle.com>, Paul Turner <pjt@...gle.com>,
        Tony Lindgren <tony@...mide.com>,
        Steven Miao <realmz6@...il.com>,
        lkml <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] tile: avoid using clocksource_cyc2ns with absolute
 cycle count

On 11/18/2016 5:34 AM, Peter Zijlstra wrote:
> On Thu, Nov 17, 2016 at 03:00:14PM -0500, Chris Metcalf wrote:
>> On 11/17/2016 4:53 AM, Peter Zijlstra wrote:
>>> On Wed, Nov 16, 2016 at 03:16:59PM -0500, Chris Metcalf wrote:
>>>> PeterZ (cc'ed) then improved it to use __int128 math via
>>>> mul_u64_u32_shr(), but that doesn't help tile; we only do one multiply
>>>> instead of two, but the multiply is handled by an out-of-line call to
>>>> __multi3, and the sched_clock() function ends up about 2.5x slower as
>>>> a result.
>>> Well, only if you set CONFIG_ARCH_SUPPORTS_INT128, otherwise it reduces
>>> to 2 32x23->64 multiplications, of which one if conditional on there
>>> actually being bits set in the high word of the u64 argument.
>> I didn't notice that.  It took me down an interesting rathole.
>>
>> Obviously the branch optimization won't help on cycle counter values,
>> since we blow out of the low 32 bits in the first few seconds of
>> uptime.  So the conditional test won't help, but the 32x32
>> multiply optimizations should.
> Now, I don't quite remember things, but isn't it the idea to convert
> cycle deltas and accumulate in ns? That way you most always convert
> small values.

I would think you would also unnecessarily accumulate small errors.

The x86 sched_clock() seems to purely scale the current TSC value,
so what tile is doing is consistent with that, at least.

-- 
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com