lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2wwrj6aid2elnnotzfkazierqmzmzpfywyf33ahqs36xh5xz32@rszeetrztsiz>
Date: Wed, 2 Apr 2025 10:16:19 +0200
From: Uwe Kleine-König <u.kleine-koenig@...libre.com>
To: David Laight <david.laight.linux@...il.com>
Cc: Nicolas Pitre <nico@...xnic.net>, 
	Andrew Morton <akpm@...ux-foundation.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] math64: Provide an uprounding variant of
 mul_u64_u64_div_u64()

On Tue, Apr 01, 2025 at 10:37:31PM +0100, David Laight wrote:
> On Tue, 1 Apr 2025 16:30:34 -0400 (EDT)
> Nicolas Pitre <nico@...xnic.net> wrote:
> 
> > On Tue, 1 Apr 2025, Nicolas Pitre wrote:
> > 
> > > On Tue, 1 Apr 2025, David Laight wrote:
> > >   
> > > > Looking at the C version, I wonder if the two ilog2() calls are needed.
> > > > They may not be cheap, and are the same as checking 'n_hi == 0'.  
> > > 
> > > Which two calls? I see only one.  
> > 
> > Hmmm, sorry. If by ilog2() you mean the clz's then those are cheap. Most 
> > CPUs have a dedicated instruction for that. The ctz, though, is more 
> > expensive (unless it is ARMv7 and above with an RBIT instruction).
> 
> The code (6.14-rc6) starts:
> 
> u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 c)
> {
> 	if (ilog2(a) + ilog2(b) <= 62)
> 		return div64_u64(a * b, c);
> 
> Now ilog2() is (probably) much the same as clz - but not all cpu have it.
> Indeed clz(a) + clz(b) >= 64 would be a more obvious test.

Ack, the more intuitive check might be

	if (fls64(a) + fls64(b) <= 64)
 		return div64_u64(a * b, c);

then, but maybe "intuitive" is subjective here?

> On 32bit a check for a >> 32 | b >> 32 before the multiply might be sane.

Not 100% sure I'm following. `a >> 32 | b >> 32` is just an indicator
that a * b fits into an u64 and in that case the above check is the
better one as this also catches e.g. a = (1ULL << 32) and b = 42.

> > > And please explain how it can be the same as checking 'n_hi == 0'.  
> > 
> > This question still stands.
> 
> 'ni_hi == 0' is exactly the condition under which you can do a 64 bit divide.
> Since it is when 'a * b' fits in 64 bits.
> 
> If you assume that most calls need the 128bit product (or why use the function)
> then there is little point adding code to optimise for the unusual case.

n_hi won't be zero when the branch

	if (ilog2(a) + ilog2(b) <= 62)
		return div64_u64(a * b, c);

wasn't taken?

Best regards
Uwe

Download attachment "signature.asc" of type "application/pgp-signature" (489 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ