linux-kernel - Re: [PATCH v3] clocksource: Replace loop within clocks_calc_mult_shift() with find_last

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aEmSH50geb-2qTBb@vaxr-BM6660-BM6360>
Date: Wed, 11 Jun 2025 22:26:39 +0800
From: I Hsin Cheng <richard120310@...il.com>
To: Yury Norov <yury.norov@...il.com>
Cc: jstultz@...gle.com, tglx@...utronix.de, sboyd@...nel.org,
	linux-kernel@...r.kernel.org, yurynorov@...il.com,
	skhan@...uxfoundation.org, linux-kernel-mentees@...ts.linux.dev,
	jserv@...s.ncku.edu.tw
Subject: Re: [PATCH v3] clocksource: Replace loop within
 clocks_calc_mult_shift() with find_last_bit() for calculation of "sftacc"

On Wed, Jun 11, 2025 at 07:50:04AM -0400, Yury Norov wrote:
> On Wed, Jun 11, 2025 at 03:36:08PM +0800, I Hsin Cheng wrote:
> > Utilize "find_last_bit()" in replacement of while loop counting
> > for the decremenet of "sftacc". They're equivalent in computation result
> > but the former is more effective.
> > 
> > "find_last_bit()" will return the bit number of the last set bit of
> > "tmp", which is 0-based index. Plus 1 to convert it into bit width as
> > desired.
> > 
> > Note that only the lowest 32 bits of "tmp" is taken into consideration
> > of the operation, since it was already shifted right by 32 bits, the
> > topmost 32 bits should remain 0, only the lowest 32 bits are possible to
> > be non-zero.
> > 
> > This change is tested against a test script [1].
> > Run the test 10 times for each version of implementation and take the
> > average. The result shown that with this change, the operation overhead
> > of "clocks_calc_mult_shift()" can be reduced around 99.7% .
> > 
> > -----------------------------
> > | old version | new version |
> > -----------------------------
> > |  11500.6 ns |       44 ns |
> > -----------------------------
> > 
> > Signed-off-by: I Hsin Cheng <richard120310@...il.com>
> > ---
> > Changelog:
> > 
> > v1 -> v2:
> > 	- Refine commit message to explain more about "why"
> > 	- Check the frequency of "clocks_calc_mult_shift()" get called,
> > 	  it's not in hotpath on my machine, refine the commit message
> > to avoid overselling it
> > 	- Add comments for the code to explain the implementation in
> > 	  more detail
> > 	- Handle case for "tmp == 0" to avoid undefined behavior
> > v2 -> v3:
> > 	- Use "find_last_bit()" instead of "__builtin_clz()"
> > 	- Convert the type of "tmp" to "const unsigned long *" when
> > 	  sending into the function
> > 	- Highlight in the comment that only the lowest 32 bits part
> > 	  of "tmp" is taken into consideration
> > 
> > [1]:
> > static int __init test_init(void)
> > {
> >     u32 mult, shift;
> >     u32 from, to, maxsec;
> >     ktime_t start_time, end_time, total_time;
> >     pr_info("Starting clocks_calc_mult_shift simple test\n");
> > 
> >     start_time = ktime_get();
> >     // Test with parameters from 1 to 1000
> >     for (from = 1; from <= 1000; from += 100) {
> >         for (to = 1; to <= 1000; to += 100) {
> >             for (maxsec = 1; maxsec <= 10; maxsec++) {
> > 
> >                 clocks_calc_mult_shift(&mult, &shift, from, to, maxsec);
> >             }
> >         }
> >     }
> > 
> >     end_time = ktime_get();
> >     total_time = ktime_to_ns(ktime_sub(end_time, start_time));
> > 
> >     pr_info("Test completed\n");
> >     pr_info("Total execution time: %lld ns \n", total_time);
> >     return 0;
> > }
> > 
> > The test is running in the form of kernel module.
> > The test machine is running ubuntu 24.04 on x86_64 machine with kernel
> > version of v6.14.0, CPU type is AMD Ryzen 7 5700X3D 8-Core Processor.
> > 
> > Hi John, Yury,
> > 
> > Would you be so kind to give some suggestion/comments on how should the
> > usage of "find_last_bit()" be here ? I'm not sure about whether the type
> > conversion of "tmp" is appropriate, though compiler will pop out warnings
> > if not doing so.
> > 
> > Plus I'm thinking converting to another pointer type might might be correct
> > when the endianess isn't guaranteed ? (or this endianess problem should be
> > address and solved in filesystem layer ?)
> > 
> > Best regards,
> > I Hsin Cheng.
> > ---
> >  kernel/time/clocksource.c | 18 ++++++++++++++----
> >  1 file changed, 14 insertions(+), 4 deletions(-)
> > 
> > diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> > index 2a7802ec480c..651bed1a53e7 100644
> > --- a/kernel/time/clocksource.c
> > +++ b/kernel/time/clocksource.c
> > @@ -66,10 +66,20 @@ clocks_calc_mult_shift(u32 *mult, u32 *shift, u32 from, u32 to, u32 maxsec)
> >  	 * range:
> >  	 */
> >  	tmp = ((u64)maxsec * from) >> 32;
> > -	while (tmp) {
> > -		tmp >>=1;
> > -		sftacc--;
> > -	}
> > +
> > +	/*
> > +	 * Decrement "sftacc" by the number of bits needed to represent "tmp".
> > +	 * Using "find_last_bit(&tmp, 32) + 1" to get the bit width:
> > +	 * - find_last_bit(&tmp, 32) returns the bit number of the last set bit
> > +	 * - Plus 1 to convert 0-based index into bit width as desired
> > +	 *
> > +	 * Note: Only the lowest 32 bits of "tmp" is taken into consideration,
> > +	 *		 since it was already shifted right by 32 bits, the topmost 32
> > +	 *		 bits are guaranteed to be 0.
> > +	 *
> > +	 */
> > +	if (sftacc)
> > +		sftacc -= (find_last_bit((const unsigned long *)&tmp, 32) + 1);
> 

Hi Yury,

Thanks for your suggestions !

> 1. sftacc is known to be 32. Comparing against 0 is useless.
> 2. Just use __fls():
>         if (tmp)
>                 sftacc -=__fls(tmp) + 1;
> 

No problem, I'll fix them up in the next version.
Just wondering the reason to use __fls() directly, is it because we're
sure that the value of "tmp" will definitely fall into
small_const_nbits() case in find_last_bit() ? 

Best regards,
I Hsin Cheng

> >  
> >  	/*
> >  	 * Find the conversion shift/mult pair which has the best
> > -- 
> > 2.43.0