linux-kernel - RE: [PATCH] x86/apic: Use div64_ul() instead of do

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <500833b5660949c8b52b756f1c2acc0e@AcuMS.aculab.com>
Date: Fri, 1 Mar 2024 08:53:03 +0000
From: David Laight <David.Laight@...LAB.COM>
To: "'H. Peter Anvin'" <hpa@...or.com>, 'Thorsten Blum'
	<thorsten.blum@...lux.com>, Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar
	<mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, Dave Hansen
	<dave.hansen@...ux.intel.com>, "x86@...nel.org" <x86@...nel.org>
CC: "Peter Zijlstra (Intel)" <peterz@...radead.org>, Wei Liu
	<wei.liu@...nel.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>
Subject: RE: [PATCH] x86/apic: Use div64_ul() instead of do_div()

From: H. Peter Anvin
> Sent: 01 March 2024 01:02
> 
> >>
> >> Change deltapm to unsigned long and replace do_div() with div64_ul()
> >> which doesn't implicitly cast the divisor and doesn't unnecessarily
> >> calculate the remainder.
> >
> >Eh? they are entirely different beasts.
> >
> >do_div() does a 64 by 32 divide that gives a 32bit quotient.
> >div64_ul() does a much more expensive 64 by 64 divide that
> >can generate a 64bit quotient.
> >
> >The remainder is pretty much free in both cases.
> >If a cpu has a divide instruction it will almost certainly
> >put the result in one register and the quotient in another.
> >
> 
> Not on e.g. RISC-V.

If the remainder isn't used the compiler should optimise
away any code used to generate it.

gcc is also generating rather sub-optimal code.
On x86 it only does one divide for code that uses 'a / b' and
'a % b', but for riscv it does separate divide and remainder
instructions.
clang does a multiply and subtract for the remainder.

Compared to any form of divide, the extra multiply is noise.

gcc also pessimises attempts to calculate the remainder:
https://godbolt.org/z/Tojh1qcvs

Are the instruction weights set correctly for divide/remainder?
It is almost as though gcc thinks remainder is fast.

Actually I suspect even the 64 by 32 divide is a software loop
on riscv (32bit).
Not checked but I suspect the implementations (esp fpga ones) won't
allow 3 inputs to the ALU.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)