[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250401202640.13342a97@pumpkin>
Date: Tue, 1 Apr 2025 20:26:40 +0100
From: David Laight <david.laight.linux@...il.com>
To: Uwe Kleine-König <u.kleine-koenig@...libre.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Nicolas Pitre
<npitre@...libre.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] math64: Provide an uprounding variant of
mul_u64_u64_div_u64()
On Tue, 1 Apr 2025 09:25:17 +0200
Uwe Kleine-König <u.kleine-koenig@...libre.com> wrote:
> Hello David,
>
> On Mon, Mar 31, 2025 at 07:53:57PM +0100, David Laight wrote:
> > On Mon, 31 Mar 2025 18:14:29 +0200
> > Uwe Kleine-König <u.kleine-koenig@...libre.com> wrote:
> > > On Fri, Mar 21, 2025 at 01:18:13PM +0000, David Laight wrote:
> > > > On Wed, 19 Mar 2025 18:14:25 +0100
> > > > Uwe Kleine-König <u.kleine-koenig@...libre.com> wrote:
> > > >
> > > > > This is needed (at least) in the pwm-stm32 driver. Currently the
> > > > > pwm-stm32 driver implements this function itself. This private
> > > > > implementation can be dropped as a followup of this patch.
> > > > >
> > > > > Signed-off-by: Uwe Kleine-König <u.kleine-koenig@...libre.com>
> > > > > ---
> > > > > include/linux/math64.h | 1 +
> > > > > lib/math/div64.c | 15 +++++++++++++++
> > > > > 2 files changed, 16 insertions(+)
> > > > >
> > > > > diff --git a/include/linux/math64.h b/include/linux/math64.h
> > > > > index 6aaccc1626ab..0c545b3ddaa5 100644
> > > > > --- a/include/linux/math64.h
> > > > > +++ b/include/linux/math64.h
> > > > > @@ -283,6 +283,7 @@ static inline u64 mul_u64_u32_div(u64 a, u32 mul, u32 divisor)
> > > > > #endif /* mul_u64_u32_div */
> > > > >
> > > > > u64 mul_u64_u64_div_u64(u64 a, u64 mul, u64 div);
> > > > > +u64 mul_u64_u64_div_u64_roundup(u64 a, u64 mul, u64 div);
> > > > >
> > > > > /**
> > > > > * DIV64_U64_ROUND_UP - unsigned 64bit divide with 64bit divisor rounded up
> > > > > diff --git a/lib/math/div64.c b/lib/math/div64.c
> > > > > index 5faa29208bdb..66beb669992d 100644
> > > > > --- a/lib/math/div64.c
> > > > > +++ b/lib/math/div64.c
> > > > > @@ -267,3 +267,18 @@ u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 c)
> > > > > }
> > > > > EXPORT_SYMBOL(mul_u64_u64_div_u64);
> > > > > #endif
> > > > > +
> > > > > +#ifndef mul_u64_u64_div_u64_roundup
> > > > > +u64 mul_u64_u64_div_u64_roundup(u64 a, u64 b, u64 c)
> > > > > +{
> > > > > + u64 res = mul_u64_u64_div_u64(a, b, c);
> > > > > + /* Those multiplications might overflow but it doesn't matter */
> > > > > + u64 rem = a * b - c * res;
> > > > > +
> > > > > + if (rem)
> > > > > + res += 1;
> > > >
> > > > Ugg...
> > > > return (((unsigned __int128_t)a * b) + (c - 1)) / c;
> > > > nearly works (on 64bit) but needs a u64 div_128_64()
> > >
> > > Both mul_u64_u64_div_u64_roundup() and mul_u64_u64_div_u64() would not
> > > be needed if we had a 128 bit type and a corresponding division on all
> > > supported architectures.
> >
> > True, but the compiler would be doing a 128 by 128 divide - which isn't
> > needed here.
> >
> > But you can rework the code to add in the offset between the multiply
> > and divide - just needs a 'tweak' to mul_u64_u64_div_u64().
>
> Yes, that would be a possibility, but I'm not convinced this gives an
> advantage. Yes it simplifies mul_u64_u64_div_u64_roundup() a bit, in
> return to making mul_u64_u64_div_u64() a bit more complicated (which is
> quite complicated already).
Adding in a 64bit offset isn't that much extra.
On most cpu it is an 'add' 'adc' pair.
Clearly it could be optimised away if a constant zero, but that will
be noise except for the x86-64 asm version.
Even there the extra 2 clocks might not be noticeable, but a separate
version for 'constant zero' wouldn't be that bad.
Looking at the C version, I wonder if the two ilog2() calls are needed.
They may not be cheap, and are the same as checking 'n_hi == 0'.
David
>
> With this patch applied and drivers/pwm/pwm-stm32.c making use of it we
> have:
>
> linux$ git grep '\<mul_u64_u64_div_u64\>' | wc -l
> 56
> linux$ git grep '\<mul_u64_u64_div_u64_roundup\>' | wc -l
> 7
>
> where 13 of the former and 4 of the latter are matches of the respective
> implementation or in comments and tests, so ~14 times more users of the
> downrounding variant and I don't want to penalize these.
>
> Best regards
> Uwe
Powered by blists - more mailing lists