[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aEfQauoqhEs2DkiG@vaxr-BM6660-BM6360>
Date: Tue, 10 Jun 2025 14:27:54 +0800
From: I Hsin Cheng <richard120310@...il.com>
To: John Stultz <jstultz@...gle.com>
Cc: sboyd@...nel.org, tglx@...utronix.de, linux-kernel@...r.kernel.org,
skhan@...uxfoundation.org, linux-kernel-mentees@...ts.linux.dev,
jserv@...s.ncku.edu.tw
Subject: Re: [RFC PATCH] clocksource: Enhancement for clocks_calc_mult_shift()
On Mon, Jun 09, 2025 at 05:32:05PM -0700, John Stultz wrote:
> Thanks so much for sending this optimization in! It looks promising,
> but a few comments below:
>
> On Mon, Jun 9, 2025 at 12:46 PM I Hsin Cheng <richard120310@...il.com> wrote:
> >
> > Previously, counting the value of "sftacc" within
> > "clocks_calc_mult_shift()" use a while loop to determine the amount of
> > decrement operation for "sftacc".
>
> This is just a direct translation of the code into english, without
> any context or explanation as to *why* the code is doing what it is
> doing.
>
> It might be better to explain first *why* the calculation is being
> done, and then get into the limitations of the current implementation.
>
Sure thing, I'll change this part.
> > It's equivalent to the position of MSB of "tmp", for which we can
> > derive from (32 - __builtin_clz(tmp)). Use 32 instead of 31 here because
> > 1UL should be treat as an amount of 1, not the index 0, and even though
> > "tmp" is of type u64, since it's already shifted right by 32, only the
> > lowest 32 bits will be possible to have non-zero value.
>
> I was having a bit of a hard time parsing this paragraph, so it
> probably needs some more work.
>
> On first skim, this also was a bit confusing as __builtin_clz(u64)
> *seems* like it could return greater than 32 (but that would be
> __builtin_clzll). So the (32 - clz(u64)) bit might confuse folks, as
> It's pretty subtle that __builtin_clz() only works on the bottom
> 32bits. The code should definitely get clear a comment on this.
>
> And thinking more, would using ffs() be more straightforward here for
> what is wanted?
>
This is a nice head up, thanks so much ! I'll switch to ffs() and see
how it goes.
> > This change is tested against a test script [1].
> > Result shown that it can save a significant amount of execution overhead
> > for this function, which is invoked often, the whole system would likely
> > benefit from it.
> >
> > -----------------------------
> > | old version | new version |
> > -----------------------------
> > | 11500.6 ns | 44 ns |
> > -----------------------------
> >
> > The execution time is reduced around 99.7%
>
> That is impressive, but should probably be contextualized around how
> often one might expect clocks_calc_mult_shift() to be called.
> It still looks like a good optimization, but we shouldn't over-sell it
> unless there's a concrete impact to users.
>
No problem, I'll try to use bpf or some debug prints approach to test
the invoked frequency of this function, what do you think ? maybe
there're better and more elegant approach to do such profiling ?
Btw I ran each test for 10 times and take the average of them, I should
amend this in the commit message.
> Thanks again for sending this along! I'd just take another pass at the
> commit message and the comments and see if this can be made easier to
> understand when someone 5 years from now is looking at the code.
>
> thanks
> -john
Hi John,
Thanks for your review and these suggestions !
I'll refine the commit message and do more test ( e.g. use ffs() ) to
see if it can be better, I agree the commit message might be too
confusing, I'll work on that more, thanks for your time!
Best regards,
I Hsin Cheng
Powered by blists - more mailing lists