linux-kernel - Re: [RFC PATCH] clocksource: Enhancement for clocks_calc_mult

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANDhNCrVtK9Tv+d1=z1yrh4OxiS+neENQejD5pCi+R=Cwra9Xg@mail.gmail.com>
Date: Mon, 9 Jun 2025 17:32:05 -0700
From: John Stultz <jstultz@...gle.com>
To: I Hsin Cheng <richard120310@...il.com>
Cc: sboyd@...nel.org, tglx@...utronix.de, linux-kernel@...r.kernel.org, 
	skhan@...uxfoundation.org, linux-kernel-mentees@...ts.linux.dev, 
	jserv@...s.ncku.edu.tw
Subject: Re: [RFC PATCH] clocksource: Enhancement for clocks_calc_mult_shift()

Thanks so much for sending this optimization in! It looks promising,
but a few comments below:

On Mon, Jun 9, 2025 at 12:46 PM I Hsin Cheng <richard120310@...il.com> wrote:
>
> Previously, counting the value of "sftacc" within
> "clocks_calc_mult_shift()" use a while loop to determine the amount of
> decrement operation for "sftacc".

This is just a direct translation of the code into english, without
any context or explanation as to *why* the code is doing what it is
doing.

It might be better to explain first *why* the calculation is being
done, and then get into the limitations of the current implementation.

> It's equivalent to the position of MSB of "tmp", for which we can
> derive from (32 - __builtin_clz(tmp)). Use 32 instead of 31 here because
> 1UL should be treat as an amount of 1, not the index 0, and even though
> "tmp" is of type u64, since it's already shifted right by 32, only the
> lowest 32 bits will be possible to have non-zero value.

I was having a bit of a hard time parsing this paragraph, so it
probably needs some more work.

On first skim, this also was a bit confusing as __builtin_clz(u64)
*seems* like it could return greater than 32 (but that would be
__builtin_clzll).  So the (32 - clz(u64)) bit might confuse folks, as
It's pretty subtle that __builtin_clz() only works on the bottom
32bits.  The code should definitely get clear a comment on this.

And thinking more, would using ffs() be more straightforward here for
what is wanted?

> This change is tested against a test script [1].
> Result shown that it can save a significant amount of execution overhead
> for this function, which is invoked often, the whole system would likely
> benefit from it.
>
> -----------------------------
> | old version | new version |
> -----------------------------
> |  11500.6 ns |       44 ns |
> -----------------------------
>
> The execution time is reduced around 99.7%

That is impressive, but should probably be contextualized around how
often one might expect clocks_calc_mult_shift() to be called.
It still looks like a good optimization, but we shouldn't over-sell it
unless there's a concrete impact to users.

Thanks again for sending this along! I'd just take another pass at the
commit message and the comments and see if this can be made easier to
understand when someone 5 years from now is looking at the code.

thanks
-john