linux-kernel - Re: [PATCH v3 next 09/10] lib: mul_u64_u64_div

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <n544n0n5-7r7p-n3o6-67s7-q03r026qo1r4@syhkavp.arg>
Date: Fri, 11 Jul 2025 17:40:57 -0400 (EDT)
From: Nicolas Pitre <nico@...xnic.net>
To: David Laight <david.laight.linux@...il.com>
cc: Andrew Morton <akpm@...ux-foundation.org>, linux-kernel@...r.kernel.org, 
    u.kleine-koenig@...libre.com, Oleg Nesterov <oleg@...hat.com>, 
    Peter Zijlstra <peterz@...radead.org>, 
    Biju Das <biju.das.jz@...renesas.com>, rostedt@...dmis.org, 
    lirongqing@...du.com
Subject: Re: [PATCH v3 next 09/10] lib: mul_u64_u64_div_u64() Optimise the
 divide code

On Fri, 11 Jul 2025, David Laight wrote:

> On Wed, 9 Jul 2025 15:24:20 +0100
> David Laight <david.laight.linux@...il.com> wrote:
> 
> > On Sat, 14 Jun 2025 10:53:45 +0100
> > David Laight <david.laight.linux@...il.com> wrote:
> > 
> > > Replace the bit by bit algorithm with one that generates 16 bits
> > > per iteration on 32bit architectures and 32 bits on 64bit ones.  
> > 
> > I've spent far too long doing some clock counting exercises on this code.
> > This is the latest version with some conditional compiles and comments
> > explaining the various optimisation.
> > I think the 'best' version is with -DMULDIV_OPT=0xc3
> > 
> >....
> > Some measurements on an ivy bridge.
> > These are the test vectors from the test module with a few extra values on the
> > end that pick different paths through this implementatoin.
> > The numbers are 'performance counter' deltas for 10 consecutive calls with the
> > same values.
> > So the latter values are with the branch predictor 'trained' to the test case.
> > The first few (larger) values show the cost of mispredicted branches.
> 
> I should have included the values for the sames tests with the existing code.
> Added here, but this is includes the change to test for overflow and divide
> by zero after the multiply (no ilog2 calls - which make the 32bit code worse).
> About the only cases where the old code it better are when the quotient
> only has two bits set.

Kudos to you.

I don't think it is necessary to go deeper than this without going crazy.

Given you have your brain wrapped around all the variants at this point, 
I'm deferring to you to create a version for the kernel representing the 
best compromise between performance and  maintainability.

Please consider including the following at the start of your next patch 
series version as this is headed for mainline already:
https://lkml.kernel.org/r/q246p466-1453-qon9-29so-37105116009q@onlyvoer.pbz


Nicolas