[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2wwrj6aid2elnnotzfkazierqmzmzpfywyf33ahqs36xh5xz32@rszeetrztsiz>
Date: Wed, 2 Apr 2025 10:16:19 +0200
From: Uwe Kleine-König <u.kleine-koenig@...libre.com>
To: David Laight <david.laight.linux@...il.com>
Cc: Nicolas Pitre <nico@...xnic.net>,
Andrew Morton <akpm@...ux-foundation.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] math64: Provide an uprounding variant of
mul_u64_u64_div_u64()
On Tue, Apr 01, 2025 at 10:37:31PM +0100, David Laight wrote:
> On Tue, 1 Apr 2025 16:30:34 -0400 (EDT)
> Nicolas Pitre <nico@...xnic.net> wrote:
>
> > On Tue, 1 Apr 2025, Nicolas Pitre wrote:
> >
> > > On Tue, 1 Apr 2025, David Laight wrote:
> > >
> > > > Looking at the C version, I wonder if the two ilog2() calls are needed.
> > > > They may not be cheap, and are the same as checking 'n_hi == 0'.
> > >
> > > Which two calls? I see only one.
> >
> > Hmmm, sorry. If by ilog2() you mean the clz's then those are cheap. Most
> > CPUs have a dedicated instruction for that. The ctz, though, is more
> > expensive (unless it is ARMv7 and above with an RBIT instruction).
>
> The code (6.14-rc6) starts:
>
> u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 c)
> {
> if (ilog2(a) + ilog2(b) <= 62)
> return div64_u64(a * b, c);
>
> Now ilog2() is (probably) much the same as clz - but not all cpu have it.
> Indeed clz(a) + clz(b) >= 64 would be a more obvious test.
Ack, the more intuitive check might be
if (fls64(a) + fls64(b) <= 64)
return div64_u64(a * b, c);
then, but maybe "intuitive" is subjective here?
> On 32bit a check for a >> 32 | b >> 32 before the multiply might be sane.
Not 100% sure I'm following. `a >> 32 | b >> 32` is just an indicator
that a * b fits into an u64 and in that case the above check is the
better one as this also catches e.g. a = (1ULL << 32) and b = 42.
> > > And please explain how it can be the same as checking 'n_hi == 0'.
> >
> > This question still stands.
>
> 'ni_hi == 0' is exactly the condition under which you can do a 64 bit divide.
> Since it is when 'a * b' fits in 64 bits.
>
> If you assume that most calls need the 128bit product (or why use the function)
> then there is little point adding code to optimise for the unusual case.
n_hi won't be zero when the branch
if (ilog2(a) + ilog2(b) <= 62)
return div64_u64(a * b, c);
wasn't taken?
Best regards
Uwe
Download attachment "signature.asc" of type "application/pgp-signature" (489 bytes)
Powered by blists - more mailing lists