[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <486a28a2-b7de-67fd-f731-1487b141319b@mellanox.com>
Date: Fri, 9 Dec 2016 12:32:07 -0500
From: Chris Metcalf <cmetcalf@...lanox.com>
To: Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>
CC: LKML <linux-kernel@...r.kernel.org>,
John Stultz <john.stultz@...aro.org>,
Ingo Molnar <mingo@...nel.org>,
David Gibson <david@...son.dropbear.id.au>,
Liav Rehana <liavr@...lanox.com>,
Richard Cochran <richardcochran@...il.com>,
Parit Bhargava <prarit@...hat.com>,
Laurent Vivier <lvivier@...hat.com>,
"Christopher S. Hall" <christopher.s.hall@...el.com>
Subject: Re: [patch 5/6] [RFD] timekeeping: Provide optional 128bit math
On 12/9/2016 3:30 AM, Peter Zijlstra wrote:
> On Fri, Dec 09, 2016 at 07:38:47AM +0100, Peter Zijlstra wrote:
>> On Fri, Dec 09, 2016 at 06:26:38AM +0100, Peter Zijlstra wrote:
>>> Just for giggles, on tilegx the branch is actually slower than doing the
>>> mult unconditionally.
>>>
>>> The problem is that the two multiplies would otherwise completely
>>> pipeline, whereas with the conditional you serialize them.
>> On my Haswell laptop the unconditional version is faster too.
> Only when using x86_64 instructions, once I fixed the i386 variant it
> was slower, probably due to register pressure and the like.
>
>>> (came to light while talking about why the mul_u64_u32_shr() fallback
>>> didn't work right for them, which was a combination of the above issue
>>> and the fact that their compiler 'lost' the fact that these are
>>> 32x32->64 mults and did 64x64 ones instead).
>> Turns out using GCC-6.2.1 we have the same problem on i386, GCC doesn't
>> recognise the 32x32 mults and generates crap.
>>
>> This used to work :/
> Do we want something like so?
>
> ---
> arch/tile/include/asm/Kbuild | 1 -
> arch/tile/include/asm/div64.h | 14 ++++++++++++++
> arch/x86/include/asm/div64.h | 10 ++++++++++
> include/linux/math64.h | 26 ++++++++++++++++++--------
> 4 files changed, 42 insertions(+), 9 deletions(-)
Untested, but I looked at it closely, and it seems like a decent idea.
Acked-by: Chris Metcalf <cmetcalf@...lanox.com> [for tile]
Of course if this is pushed up, it will then probably be too tempting for me not
to add the tilegx-specific mul_u64_u32_shr() to take advantage of pipelining
the two 32x32->64 multiplies :-)
--
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com
Powered by blists - more mailing lists