[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <500833b5660949c8b52b756f1c2acc0e@AcuMS.aculab.com>
Date: Fri, 1 Mar 2024 08:53:03 +0000
From: David Laight <David.Laight@...LAB.COM>
To: "'H. Peter Anvin'" <hpa@...or.com>, 'Thorsten Blum'
<thorsten.blum@...lux.com>, Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar
<mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, Dave Hansen
<dave.hansen@...ux.intel.com>, "x86@...nel.org" <x86@...nel.org>
CC: "Peter Zijlstra (Intel)" <peterz@...radead.org>, Wei Liu
<wei.liu@...nel.org>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>
Subject: RE: [PATCH] x86/apic: Use div64_ul() instead of do_div()
From: H. Peter Anvin
> Sent: 01 March 2024 01:02
>
> >>
> >> Change deltapm to unsigned long and replace do_div() with div64_ul()
> >> which doesn't implicitly cast the divisor and doesn't unnecessarily
> >> calculate the remainder.
> >
> >Eh? they are entirely different beasts.
> >
> >do_div() does a 64 by 32 divide that gives a 32bit quotient.
> >div64_ul() does a much more expensive 64 by 64 divide that
> >can generate a 64bit quotient.
> >
> >The remainder is pretty much free in both cases.
> >If a cpu has a divide instruction it will almost certainly
> >put the result in one register and the quotient in another.
> >
>
> Not on e.g. RISC-V.
If the remainder isn't used the compiler should optimise
away any code used to generate it.
gcc is also generating rather sub-optimal code.
On x86 it only does one divide for code that uses 'a / b' and
'a % b', but for riscv it does separate divide and remainder
instructions.
clang does a multiply and subtract for the remainder.
Compared to any form of divide, the extra multiply is noise.
gcc also pessimises attempts to calculate the remainder:
https://godbolt.org/z/Tojh1qcvs
Are the instruction weights set correctly for divide/remainder?
It is almost as though gcc thinks remainder is fast.
Actually I suspect even the 64 by 32 divide is a software loop
on riscv (32bit).
Not checked but I suspect the implementations (esp fpga ones) won't
allow 3 inputs to the ALU.
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Powered by blists - more mailing lists