lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 9 Dec 2016 12:32:07 -0500
From:   Chris Metcalf <cmetcalf@...lanox.com>
To:     Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>
CC:     LKML <linux-kernel@...r.kernel.org>,
        John Stultz <john.stultz@...aro.org>,
        Ingo Molnar <mingo@...nel.org>,
        David Gibson <david@...son.dropbear.id.au>,
        Liav Rehana <liavr@...lanox.com>,
        Richard Cochran <richardcochran@...il.com>,
        Parit Bhargava <prarit@...hat.com>,
        Laurent Vivier <lvivier@...hat.com>,
        "Christopher S. Hall" <christopher.s.hall@...el.com>
Subject: Re: [patch 5/6] [RFD] timekeeping: Provide optional 128bit math

On 12/9/2016 3:30 AM, Peter Zijlstra wrote:
> On Fri, Dec 09, 2016 at 07:38:47AM +0100, Peter Zijlstra wrote:
>> On Fri, Dec 09, 2016 at 06:26:38AM +0100, Peter Zijlstra wrote:
>>> Just for giggles, on tilegx the branch is actually slower than doing the
>>> mult unconditionally.
>>>
>>> The problem is that the two multiplies would otherwise completely
>>> pipeline, whereas with the conditional you serialize them.
>> On my Haswell laptop the unconditional version is faster too.
> Only when using x86_64 instructions, once I fixed the i386 variant it
> was slower, probably due to register pressure and the like.
>
>>> (came to light while talking about why the mul_u64_u32_shr() fallback
>>> didn't work right for them, which was a combination of the above issue
>>> and the fact that their compiler 'lost' the fact that these are
>>> 32x32->64 mults and did 64x64 ones instead).
>> Turns out using GCC-6.2.1 we have the same problem on i386, GCC doesn't
>> recognise the 32x32 mults and generates crap.
>>
>> This used to work :/
> Do we want something like so?
>
> ---
>   arch/tile/include/asm/Kbuild  |  1 -
>   arch/tile/include/asm/div64.h | 14 ++++++++++++++
>   arch/x86/include/asm/div64.h  | 10 ++++++++++
>   include/linux/math64.h        | 26 ++++++++++++++++++--------
>   4 files changed, 42 insertions(+), 9 deletions(-)

Untested, but I looked at it closely, and it seems like a decent idea.

Acked-by: Chris Metcalf <cmetcalf@...lanox.com> [for tile]

Of course if this is pushed up, it will then probably be too tempting for me not
to add the tilegx-specific mul_u64_u32_shr() to take advantage of pipelining
the two 32x32->64 multiplies :-)

-- 
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ