[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <15qr98n0-q1q0-or1r-7r32-36rrq93p9oq6@onlyvoer.pbz>
Date: Tue, 1 Apr 2025 16:13:30 -0400 (EDT)
From: Nicolas Pitre <npitre@...libre.com>
To: David Laight <david.laight.linux@...il.com>
cc: Uwe Kleine-König <u.kleine-koenig@...libre.com>,
Andrew Morton <akpm@...ux-foundation.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] math64: Provide an uprounding variant of
mul_u64_u64_div_u64()
On Tue, 1 Apr 2025, David Laight wrote:
> On Tue, 1 Apr 2025 09:25:17 +0200
> Uwe Kleine-König <u.kleine-koenig@...libre.com> wrote:
>
> > On Mon, Mar 31, 2025 at 07:53:57PM +0100, David Laight wrote:
> >
> > > But you can rework the code to add in the offset between the multiply
> > > and divide - just needs a 'tweak' to mul_u64_u64_div_u64().
> >
> > Yes, that would be a possibility, but I'm not convinced this gives an
> > advantage. Yes it simplifies mul_u64_u64_div_u64_roundup() a bit, in
> > return to making mul_u64_u64_div_u64() a bit more complicated (which is
> > quite complicated already).
>
> Adding in a 64bit offset isn't that much extra.
> On most cpu it is an 'add' 'adc' pair.
> Clearly it could be optimised away if a constant zero, but that will
> be noise except for the x86-64 asm version.
You still have to ask:
- How many users do need that version?
- For the zero case to be optimized away, you need some inlining which
those functions aren't. This means either passing the (albeit tiny)
overhead to everybody or duplicating the core division code meaning
bigger footprint.
- And this is not only about the extra 2 clocks. You need to account for
passing an extra 64-bit argument which is most likely to be spilled to
the stack especially on 32-bits systems as the 64x64=128 multiply has
to be performed before adding the extra argument.
Hence the proposed compromise.
> Looking at the C version, I wonder if the two ilog2() calls are needed.
> They may not be cheap, and are the same as checking 'n_hi == 0'.
Which two calls? I see only one.
And please explain how it can be the same as checking 'n_hi == 0'.
Nicolas
Powered by blists - more mailing lists