[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <n544n0n5-7r7p-n3o6-67s7-q03r026qo1r4@syhkavp.arg>
Date: Fri, 11 Jul 2025 17:40:57 -0400 (EDT)
From: Nicolas Pitre <nico@...xnic.net>
To: David Laight <david.laight.linux@...il.com>
cc: Andrew Morton <akpm@...ux-foundation.org>, linux-kernel@...r.kernel.org,
u.kleine-koenig@...libre.com, Oleg Nesterov <oleg@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Biju Das <biju.das.jz@...renesas.com>, rostedt@...dmis.org,
lirongqing@...du.com
Subject: Re: [PATCH v3 next 09/10] lib: mul_u64_u64_div_u64() Optimise the
divide code
On Fri, 11 Jul 2025, David Laight wrote:
> On Wed, 9 Jul 2025 15:24:20 +0100
> David Laight <david.laight.linux@...il.com> wrote:
>
> > On Sat, 14 Jun 2025 10:53:45 +0100
> > David Laight <david.laight.linux@...il.com> wrote:
> >
> > > Replace the bit by bit algorithm with one that generates 16 bits
> > > per iteration on 32bit architectures and 32 bits on 64bit ones.
> >
> > I've spent far too long doing some clock counting exercises on this code.
> > This is the latest version with some conditional compiles and comments
> > explaining the various optimisation.
> > I think the 'best' version is with -DMULDIV_OPT=0xc3
> >
> >....
> > Some measurements on an ivy bridge.
> > These are the test vectors from the test module with a few extra values on the
> > end that pick different paths through this implementatoin.
> > The numbers are 'performance counter' deltas for 10 consecutive calls with the
> > same values.
> > So the latter values are with the branch predictor 'trained' to the test case.
> > The first few (larger) values show the cost of mispredicted branches.
>
> I should have included the values for the sames tests with the existing code.
> Added here, but this is includes the change to test for overflow and divide
> by zero after the multiply (no ilog2 calls - which make the 32bit code worse).
> About the only cases where the old code it better are when the quotient
> only has two bits set.
Kudos to you.
I don't think it is necessary to go deeper than this without going crazy.
Given you have your brain wrapped around all the variants at this point,
I'm deferring to you to create a version for the kernel representing the
best compromise between performance and maintainability.
Please consider including the following at the start of your next patch
series version as this is headed for mainline already:
https://lkml.kernel.org/r/q246p466-1453-qon9-29so-37105116009q@onlyvoer.pbz
Nicolas
Powered by blists - more mailing lists