linux-kernel - Re: [PATCH 3/4] bitops: squeeze even more out of fns()

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <ZjX3mtNIp/ceCrKG@visitorckw-System-Product-Name>
Date: Sat, 4 May 2024 16:53:46 +0800
From: Kuan-Wei Chiu <visitorckw@...il.com>
To: Yury Norov <yury.norov@...il.com>
Cc: linux-kernel@...r.kernel.org,
	Rasmus Villemoes <linux@...musvillemoes.dk>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Chin-Chun Chen <n26122115@...ncku.edu.tw>,
	Ching-Chun Huang <jserv@...s.ncku.edu.tw>
Subject: Re: [PATCH 3/4] bitops: squeeze even more out of fns()

On Fri, May 03, 2024 at 09:13:32AM -0700, Yury Norov wrote:
> On Fri, May 03, 2024 at 10:19:10AM +0800, Kuan-Wei Chiu wrote:
> > +Cc Chin-Chun Chen & Ching-Chun (Jim) Huang
> > 
> > On Thu, May 02, 2024 at 04:32:03PM -0700, Yury Norov wrote:
> > > The function clears N-1 first set bits to find the N'th one with:
> > > 
> > > 	while (word && n--)
> > > 		word &= word - 1;
> > > 
> > > In the worst case, it would take 63 iterations.
> > > 
> > > Instead of linear walk through the set bits, we can do a binary search
> > > by using hweight(). This would work even better on platforms supporting
> > > hardware-assisted hweight() - pretty much every modern arch.
> > > 
> > Chin-Chun once proposed a method similar to binary search combined with
> > hamming weight and discussed it privately with me and Jim. However,
> > Chin-Chun found that binary search would actually impair performance
> > when n is small. Since we are unsure about the typical range of n in
> > our actual workload, we have not yet proposed any relevant patches. If
> > considering only the overall benchmark results, this patch looks good
> > to me.
> 
> fns() is used only as a helper to find_nth_bit(). 
> 
> In the kernel the find_nth_bit() is used in
>  - bitmap_bitremap((),
>  - bitmap_remap(), and
>  - cpumask_local_spread() via sched_numa_find_nth_cpu()
> 
> with the bit to search calculated as n = n % cpumask_weigth(). This
> virtually implies random uniformly distributed n and word, just like
> in the test_fns().
> 
> In rebalance_wq_table() in drivers/crypto/intel/iaa/iaa_crypto_main.c
> it's used like:
>         
>          for (cpu = 0; cpu < nr_cpus_per_node; cpu++) {
>                    int node_cpu = cpumask_nth(cpu, node_cpus);
>                    ...
>          }
> 
> This is an API abuse, and should be rewritten with for_each_cpu()
> 
> In cpumask_any_housekeeping() at arch/x86/kernel/cpu/resctrl/internal.h
> it's used like:
> 
>  90         hk_cpu = cpumask_nth_andnot(0, mask, tick_nohz_full_mask);
>  91         if (hk_cpu == exclude_cpu)
>  92                 hk_cpu = cpumask_nth_andnot(1, mask, tick_nohz_full_mask);
>  93 
>  94         if (hk_cpu < nr_cpu_ids)
>  95                 cpu = hk_cpu;
> 
> And this is another example of the API abuse. We need to introduce a new
> helper cpumask_andnot_any_but() and use it like:
> 
>         hk_cpu = cpumask_andnot_any_but(exclude_cpu, mask, tick_nohz_full_mask).
>         if (hk_cpu < nr_cpu_ids)
>                  cpu = hk_cpu;
> 
> So, where the use of find_nth_bit() is legitimate, the parameters are
> distributed like in the test, and I would expect the real-life
> performance impact to be similar to the test.
> 
> Optimizing the helper for non-legitimate cases doesn't worth the
> effort.
>
Got it, thank you for your detailed explanation :)

Reviewed-by: Kuan-Wei Chiu <visitorckw@...il.com>

Regards,
Kuan-Wei