lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <15qr98n0-q1q0-or1r-7r32-36rrq93p9oq6@onlyvoer.pbz>
Date: Tue, 1 Apr 2025 16:13:30 -0400 (EDT)
From: Nicolas Pitre <npitre@...libre.com>
To: David Laight <david.laight.linux@...il.com>
cc: Uwe Kleine-König <u.kleine-koenig@...libre.com>, 
    Andrew Morton <akpm@...ux-foundation.org>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] math64: Provide an uprounding variant of
 mul_u64_u64_div_u64()

On Tue, 1 Apr 2025, David Laight wrote:

> On Tue, 1 Apr 2025 09:25:17 +0200
> Uwe Kleine-König <u.kleine-koenig@...libre.com> wrote:
> 
> > On Mon, Mar 31, 2025 at 07:53:57PM +0100, David Laight wrote:
> > 
> > > But you can rework the code to add in the offset between the multiply
> > > and divide - just needs a 'tweak' to mul_u64_u64_div_u64().  
> > 
> > Yes, that would be a possibility, but I'm not convinced this gives an
> > advantage. Yes it simplifies mul_u64_u64_div_u64_roundup() a bit, in
> > return to making mul_u64_u64_div_u64() a bit more complicated (which is
> > quite complicated already).
> 
> Adding in a 64bit offset isn't that much extra.
> On most cpu it is an 'add' 'adc' pair.
> Clearly it could be optimised away if a constant zero, but that will
> be noise except for the x86-64 asm version.

You still have to ask:

- How many users do need that version?

- For the zero case to be optimized away, you need some inlining which 
  those functions aren't. This means either passing the (albeit tiny) 
  overhead to everybody or duplicating the core division code meaning 
  bigger footprint.

- And this is not only about the extra 2 clocks. You need to account for 
  passing an extra 64-bit argument which is most likely to be spilled to 
  the stack especially on 32-bits systems as the 64x64=128 multiply has 
  to be performed before adding the extra argument.

Hence the proposed compromise.

> Looking at the C version, I wonder if the two ilog2() calls are needed.
> They may not be cheap, and are the same as checking 'n_hi == 0'.

Which two calls? I see only one.

And please explain how it can be the same as checking 'n_hi == 0'.


Nicolas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ