linux-kernel - [PATCH v3 next 03/10] lib: mul_u64_u64_div_u64() simplify check for a 64bit product

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20250614095346.69130-4-david.laight.linux@gmail.com>
Date: Sat, 14 Jun 2025 10:53:39 +0100
From: David Laight <david.laight.linux@...il.com>
To: Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org
Cc: David Laight <david.laight.linux@...il.com>,
	u.kleine-koenig@...libre.com,
	Nicolas Pitre <npitre@...libre.com>,
	Oleg Nesterov <oleg@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Biju Das <biju.das.jz@...renesas.com>
Subject: [PATCH v3 next 03/10] lib: mul_u64_u64_div_u64() simplify check for a 64bit product

If the product is only 64bits div64_u64() can be used for the divide.
Replace the pre-multiply check (ilog2(a) + ilog2(b) <= 62) with a
simple post-multiply check that the high 64bits are zero.

This has the advantage of being simpler, more accurate and less code.
It will always be faster when the product is larger than 64bits.

Most 64bit cpu have a native 64x64=128 bit multiply, this is needed
(for the low 64bits) even when div64_u64() is called - so the early
check gains nothing and is just extra code.

32bit cpu will need a compare (etc) to generate the 64bit ilog2()
from two 32bit bit scans - so that is non-trivial.
(Never mind the mess of x86's 'bsr' and any oddball cpu without
fast bit-scan instructions.)
Whereas the additional instructions for the 128bit multiply result
are pretty much one multiply and two adds (typically the 'adc $0,%reg'
can be run in parallel with the instruction that follows).

The only outliers are 64bit systems without 128bit mutiply and
simple in order 32bit ones with fast bit scan but needing extra
instructions to get the high bits of the multiply result.
I doubt it makes much difference to either, the latter is definitely
not mainsteam.

Split from patch 3 of v2 of this series.

If anyone is worried about the analysis they can look at the
generated code for x86 (especially when cmov isn't used).

Signed-off-by: David Laight <david.laight.linux@...il.com>
---
 lib/math/div64.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/math/div64.c b/lib/math/div64.c
index 397578dc9a0b..ed9475b9e1ef 100644
--- a/lib/math/div64.c
+++ b/lib/math/div64.c
@@ -196,9 +196,6 @@ u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d)
 		return 0;
 	}

-	if (ilog2(a) + ilog2(b) <= 62)
-		return div64_u64(a * b, d);
-
 #if defined(__SIZEOF_INT128__)

 	/* native 64x64=128 bits multiplication */
@@ -222,6 +219,9 @@ u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 d)

 #endif

+	if (!n_hi)
+		return div64_u64(n_lo, d);
+
 	if (WARN_ONCE(n_hi >= d,
 		      "%s: division of (%#llx * %#llx = %#llx%016llx) by %#llx overflows, returning ~0",
 		      __func__, a, b, n_hi, n_lo, d))
-- 
2.39.5