[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250723224831.4492ec75@pumpkin>
Date: Wed, 23 Jul 2025 22:48:30 +0100
From: David Laight <david.laight.linux@...il.com>
To: Oleg Nesterov <oleg@...hat.com>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>, Borislav Petkov <bp@...en8.de>, Dave
Hansen <dave.hansen@...ux.intel.com>, "H. Peter Anvin" <hpa@...or.com>,
"Li,Rongqing" <lirongqing@...du.com>, Steven Rostedt <rostedt@...dmis.org>,
linux-kernel@...r.kernel.org, x86@...nel.org
Subject: Re: [PATCH] x86/math64: handle #DE in mul_u64_u64_div_u64()
On Wed, 23 Jul 2025 11:38:25 +0200
Oleg Nesterov <oleg@...hat.com> wrote:
> On 07/22, David Laight wrote:
> >
> > On Tue, 22 Jul 2025 15:21:48 +0200
> > Oleg Nesterov <oleg@...hat.com> wrote:
> >
> > > static inline u64 mul_u64_u64_div_u64(u64 a, u64 mul, u64 div)
> > > {
> > > char ok = 0;
> > > u64 q;
> > >
> > > asm ("mulq %3; 1: divq %4; movb $1,%1; 2:\n"
> > > _ASM_EXTABLE(1b, 2b)
> > > : "=a" (q), "+q" (ok)
> > > : "a" (a), "rm" (mul), "rm" (div)
> > > : "rdx");
> > >
> > > if (ok)
> > > return q;
> > > BUG_ON(!div);
> > > WARN_ONCE(1, "muldiv overflow.\n");
> >
> > I wonder what WARN_ON_ONCE("muldiv overflow") outputs?
>
> Well, it outputs "muldiv overflow." ;) So I am not sure it is better
> than just WARN_ON_ONCE(1).
>
> > Actually, without the BUG or WARN you want:
> > u64 fail = ~(u64)0;
> > then
> > incq $1 ... "+r" (fail)
> > and finally
> > return q | fail;
> > to remove the conditional branches from the normal path
> > (apart from one the caller might do)
>
> I was thinking about
>
> static inline u64 mul_u64_u64_div_u64(u64 a, u64 mul, u64 div)
> {
> u64 q;
>
> asm ("mulq %2; 1: divq %3; jmp 3f; 2: movq $-1,%0; 3:\n"
> _ASM_EXTABLE(1b, 2b)
> : "=a" (q)
> : "a" (a), "rm" (mul), "rm" (div)
> : "rdx");
>
> return q;
> }
>
> to remove the conditional branch and additional variable. Your version
> is probably beterr... But this is without WARN/BUG.
I wish there was a way of doing a WARN_ONCE from asm with a single instruction.
Then you could put one after your 2:
Otherwise is it a conditional and a load of inlined code.
> So, which version do you prefer?
I wish I knew :-)
Yours is a few bytes shorter, uses one less register, but has that unconditional jmp.
I suspect we don't worry about the cpu not predicting a jump - especially with
the divq.
It's not as though some real-time code relies on this code being as fast
as absolutely possible.
Not using a register is probably the main win.
So maybe I lose (this time).
Further work could add an 'int *' parameter that is set non-zero (from %rax)
if the divide traps; optimised out if NULL.
The easy way is two copies of the asm statement.
But I've already got two copies in the version that does (a * b + c)/d
and four copies is getting silly.
Actually this seems ok - at least as a real function:
u64 mul_u64_add_u64_div_u64(u64 a, u64 b, u64 c, u64 div)
{
unsigned __int128 v = (__int128)a * b + c;
asm ("1: divq %1; jmp 3f; 2: movq $-1,%%rax; 3:\n"
_ASM_EXTABLE(1b, 2b)
: "+A" (v)
: "r" (div));
return v;
}
But (as I found with 32bit) gcc can decide to do a 128x128 multiply.
It does do a full 128bit add - with an extra register for the zero.
Not that you should never pass "rm" to clang, needs to be "r".
There is a #define for it.
David
>
> Oleg.
>
Powered by blists - more mailing lists