linux-kernel - Re: [PATCH] cpumask: Optimize cpumask_any

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z4qGNgu_HYp5LK6D@thinkpad>
Date: Fri, 17 Jan 2025 11:32:54 -0500
From: Yury Norov <yury.norov@...il.com>
To: I Hsin Cheng <richard120310@...il.com>
Cc: Kuan-Wei Chiu <visitorckw@...il.com>, linux@...musvillemoes.dk,
	jserv@...s.ncku.edu.tw, mark.rutland@....com,
	linux-kernel@...r.kernel.org, eleanor15x@...il.com
Subject: Re: [PATCH] cpumask: Optimize cpumask_any_but()

On Fri, Jan 17, 2025 at 10:59:31PM +0800, I Hsin Cheng wrote:
> On Fri, Jan 17, 2025 at 10:26:58PM +0800, Kuan-Wei Chiu wrote:
> > The cpumask_any_but() function can avoid using a loop to determine the
> > CPU index to return. If the first set bit in the cpumask is not equal
> > to the specified CPU, we can directly return the index of the first set
> > bit. Otherwise, we return the next set bit's index.
> > 
> > This optimization replaces the loop with a single if statement,
> > allowing the compiler to generate more concise and efficient code.

I thought compilers are smart enough to unroll loop in this case. Can
you show disassembled code before and after?

> > 
> > As a result, the size of the bzImage built with x86 defconfig is
> > reduced by 4096 bytes:
> > 
> > * Before:
> > $ size arch/x86/boot/bzImage
> >    text    data     bss     dec     hex filename
> > 13537280           1024       0 13538304         ce9400 arch/x86/boot/bzImage
> > 
> > * After:
> > $ size arch/x86/boot/bzImage
> >    text    data     bss     dec     hex filename
> > 13533184           1024       0 13534208         ce8400 arch/x86/boot/bzImage

Comparing zipped images tells little about code generation. Please use
scripts/bloat-o-meter.

> > 
> > Co-developed-by: Yu-Chun Lin <eleanor15x@...il.com>
> > Signed-off-by: Yu-Chun Lin <eleanor15x@...il.com>
> > Signed-off-by: Kuan-Wei Chiu <visitorckw@...il.com>
> > ---
> > Not sure how to measure the efficiency difference, but I guess this
> > patch might be slightly more efficient or nearly the same as before. If
> > you have any good ideas for measuring efficiency, please let me know!

Check lib/find_bit_benchmark.c

> > 
> >  include/linux/cpumask.h | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
> > index 9278a50d514f..b769fcdbaa10 100644
> > --- a/include/linux/cpumask.h
> > +++ b/include/linux/cpumask.h
> > @@ -404,10 +404,10 @@ unsigned int cpumask_any_but(const struct cpumask *mask, unsigned int cpu)
> >  	unsigned int i;
> >  
> >  	cpumask_check(cpu);
> > -	for_each_cpu(i, mask)
> > -		if (i != cpu)
> > -			break;
> > -	return i;
> > +	i = find_first_bit(cpumask_bits(mask), small_cpumask_bits);
> 
> Hi Kuan-Wei,
> 
> How about using cpumask_first(mask) here to keep better consistency?

I would do it the other way: introduce find_first_but_bit(), and then
make cpumask_any_but() a wrapper around it. Doing this you'll be able
to leverage find_bit_benchmark infrastructure to measure performance
difference, if any.
 
> > +	if (i != cpu)
> > +		return i;
> Wouldn't it benefit abit to check "i >= nr_cpu_ids" prior to
> find_next_bit() ?

Yes it would.

Thanks,
Yury

> if "i >= nr_cpu_ids" holds it would be a waste to
> perform find_next_bit().
> 
> > +	return find_next_bit(cpumask_bits(mask), small_cpumask_bits, i + 1);
> >  }
> >  
> 
> Regards,
> I Hsin
> 
> >  /**
> > -- 
> > 2.34.1
> >