linux-kernel - Re: [PATCH] math64: Provide an uprounding variant of mul_u64_u64_div

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250401202640.13342a97@pumpkin>
Date: Tue, 1 Apr 2025 20:26:40 +0100
From: David Laight <david.laight.linux@...il.com>
To: Uwe Kleine-König <u.kleine-koenig@...libre.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Nicolas Pitre
 <npitre@...libre.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] math64: Provide an uprounding variant of
 mul_u64_u64_div_u64()

On Tue, 1 Apr 2025 09:25:17 +0200
Uwe Kleine-König <u.kleine-koenig@...libre.com> wrote:

> Hello David,
> 
> On Mon, Mar 31, 2025 at 07:53:57PM +0100, David Laight wrote:
> > On Mon, 31 Mar 2025 18:14:29 +0200
> > Uwe Kleine-König <u.kleine-koenig@...libre.com> wrote:  
> > > On Fri, Mar 21, 2025 at 01:18:13PM +0000, David Laight wrote:  
> > > > On Wed, 19 Mar 2025 18:14:25 +0100
> > > > Uwe Kleine-König <u.kleine-koenig@...libre.com> wrote:
> > > >     
> > > > > This is needed (at least) in the pwm-stm32 driver. Currently the
> > > > > pwm-stm32 driver implements this function itself. This private
> > > > > implementation can be dropped as a followup of this patch.
> > > > > 
> > > > > Signed-off-by: Uwe Kleine-König <u.kleine-koenig@...libre.com>
> > > > > ---
> > > > >  include/linux/math64.h |  1 +
> > > > >  lib/math/div64.c       | 15 +++++++++++++++
> > > > >  2 files changed, 16 insertions(+)
> > > > > 
> > > > > diff --git a/include/linux/math64.h b/include/linux/math64.h
> > > > > index 6aaccc1626ab..0c545b3ddaa5 100644
> > > > > --- a/include/linux/math64.h
> > > > > +++ b/include/linux/math64.h
> > > > > @@ -283,6 +283,7 @@ static inline u64 mul_u64_u32_div(u64 a, u32 mul, u32 divisor)
> > > > >  #endif /* mul_u64_u32_div */
> > > > >  
> > > > >  u64 mul_u64_u64_div_u64(u64 a, u64 mul, u64 div);
> > > > > +u64 mul_u64_u64_div_u64_roundup(u64 a, u64 mul, u64 div);
> > > > >  
> > > > >  /**
> > > > >   * DIV64_U64_ROUND_UP - unsigned 64bit divide with 64bit divisor rounded up
> > > > > diff --git a/lib/math/div64.c b/lib/math/div64.c
> > > > > index 5faa29208bdb..66beb669992d 100644
> > > > > --- a/lib/math/div64.c
> > > > > +++ b/lib/math/div64.c
> > > > > @@ -267,3 +267,18 @@ u64 mul_u64_u64_div_u64(u64 a, u64 b, u64 c)
> > > > >  }
> > > > >  EXPORT_SYMBOL(mul_u64_u64_div_u64);
> > > > >  #endif
> > > > > +
> > > > > +#ifndef mul_u64_u64_div_u64_roundup
> > > > > +u64 mul_u64_u64_div_u64_roundup(u64 a, u64 b, u64 c)
> > > > > +{
> > > > > +	u64 res = mul_u64_u64_div_u64(a, b, c);
> > > > > +	/* Those multiplications might overflow but it doesn't matter */
> > > > > +	u64 rem = a * b - c * res;
> > > > > +
> > > > > +	if (rem)
> > > > > +		res += 1;    
> > > > 
> > > > Ugg...
> > > > 	return (((unsigned __int128_t)a * b) + (c - 1)) / c;
> > > > nearly works (on 64bit) but needs a u64 div_128_64()    
> > > 
> > > Both mul_u64_u64_div_u64_roundup() and mul_u64_u64_div_u64() would not
> > > be needed if we had a 128 bit type and a corresponding division on all
> > > supported architectures.  
> > 
> > True, but the compiler would be doing a 128 by 128 divide - which isn't
> > needed here.
> > 
> > But you can rework the code to add in the offset between the multiply
> > and divide - just needs a 'tweak' to mul_u64_u64_div_u64().  
> 
> Yes, that would be a possibility, but I'm not convinced this gives an
> advantage. Yes it simplifies mul_u64_u64_div_u64_roundup() a bit, in
> return to making mul_u64_u64_div_u64() a bit more complicated (which is
> quite complicated already).

Adding in a 64bit offset isn't that much extra.
On most cpu it is an 'add' 'adc' pair.
Clearly it could be optimised away if a constant zero, but that will
be noise except for the x86-64 asm version.
Even there the extra 2 clocks might not be noticeable, but a separate
version for 'constant zero' wouldn't be that bad.

Looking at the C version, I wonder if the two ilog2() calls are needed.
They may not be cheap, and are the same as checking 'n_hi == 0'.

	David

> 
> With this patch applied and drivers/pwm/pwm-stm32.c making use of it we
> have:
> 
> linux$ git grep '\<mul_u64_u64_div_u64\>' | wc -l
> 56
> linux$ git grep '\<mul_u64_u64_div_u64_roundup\>' | wc -l
> 7
> 
> where 13 of the former and 4 of the latter are matches of the respective
> implementation or in comments and tests, so ~14 times more users of the
> downrounding variant and I don't want to penalize these.
> 
> Best regards
> Uwe