[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250614095346.69130-1-david.laight.linux@gmail.com>
Date: Sat, 14 Jun 2025 10:53:36 +0100
From: David Laight <david.laight.linux@...il.com>
To: Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org
Cc: David Laight <david.laight.linux@...il.com>,
u.kleine-koenig@...libre.com,
Nicolas Pitre <npitre@...libre.com>,
Oleg Nesterov <oleg@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Biju Das <biju.das.jz@...renesas.com>
Subject: [PATCH v3 next 00/10] Implement mul_u64_u64_div_u64_roundup()
The pwm-stm32.c code wants a 'rounding up' version of mul_u64_u64_div_u64().
This can be done simply by adding 'divisor - 1' to the 128bit product.
Implement mul_u64_add_u64_div_u64(a, b, c, d) = (a * b + c)/d based on the
existing code.
Define mul_u64_u64_div_u64(a, b, d) as mul_u64_add_u64_div_u64(a, b, 0, d) and
mul_u64_u64_div_u64_roundup(a, b, d) as mul_u64_add_u64_div_u64(a, b, d-1, d).
Only x86-64 has an optimsed (asm) version of the function.
That is optimised to avoid the 'add c' when c is known to be zero.
In all other cases the extra code will be noise compared to the software
divide code.
The test module has been upadted to test mul_u64_u64_div_u64_roundup() and
also enhanced it to verify the C division code on x86-64 and the 32bit
division code on 64bit.
Changes for v2:
- Rename the 'divisor' parameter from 'c' to 'd'.
- Add an extra patch to use BUG_ON() to trap zero divisors.
- Remove the last patch that ran the C code on x86-64
(I've a plan to do that differently).
Changes for v3:
- Replace the BUG_ON() (or panic in the original version) for zero
divisors with a WARN_ONCE() and return zero.
- Remove the 'pre-multiply' check for small products.
Completely non-trivial on 32bit systems.
- Use mul_u32_u32() and the new add_u64_u32() to stop gcc generating
pretty much pessimal code for x86 with lots of register spills.
- Replace the 'bit at a time' divide with one that generates 16 bits
per iteration on 32bit systems and 32 bits per iteration on 64bit.
Massively faster, the tests run in under 1/3 the time.
David Laight (10):
lib: mul_u64_u64_div_u64() rename parameter 'c' to 'd'
lib: mul_u64_u64_div_u64() Use WARN_ONCE() for divide errors.
lib: mul_u64_u64_div_u64() simplify check for a 64bit product
lib: Add mul_u64_add_u64_div_u64() and mul_u64_u64_div_u64_roundup()
lib: Add tests for mul_u64_u64_div_u64_roundup()
lib: test_mul_u64_u64_div_u64: Test both generic and arch versions
lib: mul_u64_u64_div_u64() optimise multiply on 32bit x86
lib: mul_u64_u64_div_u64() Separate multiply to a helper for clarity
lib: mul_u64_u64_div_u64() Optimise the divide code
lib: test_mul_u64_u64_div_u64: Test the 32bit code on 64bit
arch/x86/include/asm/div64.h | 33 ++++-
include/linux/math64.h | 56 ++++++++-
lib/math/div64.c | 189 +++++++++++++++++++---------
lib/math/test_mul_u64_u64_div_u64.c | 187 +++++++++++++++++++--------
4 files changed, 349 insertions(+), 116 deletions(-)
--
2.39.5
Powered by blists - more mailing lists