lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZjEsJu3L/pfDcl09@yury-ThinkPad>
Date: Tue, 30 Apr 2024 10:36:38 -0700
From: Yury Norov <yury.norov@...il.com>
To: Kuan-Wei Chiu <visitorckw@...il.com>
Cc: akpm@...ux-foundation.org, linux@...musvillemoes.dk,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 2/2] bitops: Optimize fns() for improved performance

On Tue, Apr 30, 2024 at 01:49:12PM +0800, Kuan-Wei Chiu wrote:
> The current fns() repeatedly uses __ffs() to find the index of the
> least significant bit and then clears the corresponding bit using
> __clear_bit(). The method for clearing the least significant bit can be
> optimized by using word &= word - 1 instead.
> 
> Typically, the execution time of one __ffs() plus one __clear_bit() is
> longer than that of a bitwise AND operation and a subtraction. To
> improve performance, the loop for clearing the least significant bit
> has been replaced with word &= word - 1, followed by a single __ffs()
> operation to obtain the answer. This change reduces the number of
> __ffs() iterations from n to just one, enhancing overall performance.
> 
> This change speeds up the find_nth_bit() in the find_bit benchmark by
> approximately ~1.26 times:

26% sounds better.

> Before:
> find_nth_bit:                  4254313 ns,  16525 iterations
> 
> After:
> find_nth_bit:                  3362863 ns,  16501 iterations
> 
> The following microbenchmark data, conducted on my x86-64 machine,
> shows the execution time (in microseconds) required for 1000000 test
> data generated by get_random_u64() and executed by fns() under
> different values of n:
> 
> +-----+---------------+---------------+
> |  n  |   time_old    |   time_new    |
> +-----+---------------+---------------+
> |  0  |     29194     |     25878     |
> |  1  |     25510     |     25497     |
> |  2  |     27836     |     25721     |
> |  3  |     30140     |     25673     |
> |  4  |     32569     |     25426     |
> |  5  |     34792     |     25690     |
> |  6  |     37117     |     25651     |
> |  7  |     39742     |     25383     |
> |  8  |     42360     |     25657     |
> |  9  |     44672     |     25897     |
> | 10  |     47237     |     25819     |
> | 11  |     49884     |     26530     |
> | 12  |     51864     |     26647     |
> | 13  |     54265     |     28915     |
> | 14  |     56440     |     28373     |
> | 15  |     58839     |     28616     |
> | 16  |     62383     |     29128     |
> | 17  |     64257     |     30041     |
> | 18  |     66805     |     29773     |
> | 19  |     69368     |     33203     |
> | 20  |     72942     |     33688     |
> | 21  |     77006     |     34518     |
> | 22  |     80926     |     34298     |
> | 23  |     85723     |     35586     |
> | 24  |     90324     |     36376     |
> | 25  |     95992     |     37465     |
> | 26  |    101101     |     37599     |
> | 27  |    106520     |     37466     |
> | 28  |    113287     |     38163     |
> | 29  |    120552     |     38810     |
> | 30  |    128040     |     39373     |
> | 31  |    135624     |     40500     |
> | 32  |    142580     |     40343     |
> | 33  |    148915     |     40460     |
> | 34  |    154005     |     41294     |
> | 35  |    157996     |     41730     |
> | 36  |    160806     |     41523     |
> | 37  |    162975     |     42088     |
> | 38  |    163426     |     41530     |
> | 39  |    164872     |     41789     |
> | 40  |    164477     |     42505     |
> | 41  |    164758     |     41879     |
> | 42  |    164182     |     41415     |
> | 43  |    164842     |     42119     |
> | 44  |    164881     |     42297     |
> | 45  |    164870     |     42145     |
> | 46  |    164673     |     42066     |
> | 47  |    164616     |     42051     |
> | 48  |    165055     |     41902     |
> | 49  |    164847     |     41862     |
> | 50  |    165171     |     41960     |
> | 51  |    164851     |     42089     |
> | 52  |    164763     |     41717     |
> | 53  |    164635     |     42154     |
> | 54  |    164757     |     41983     |
> | 55  |    165095     |     41419     |
> | 56  |    164641     |     42381     |
> | 57  |    164601     |     41654     |
> | 58  |    164864     |     41834     |
> | 59  |    164594     |     41920     |
> | 60  |    165207     |     42020     |
> | 61  |    165056     |     41185     |
> | 62  |    165160     |     41722     |
> | 63  |    164923     |     41702     |
> | 64  |    164777     |     41880     |
> +-----+---------------+---------------+

Please, just a total gross in the commit message.
 
> Signed-off-by: Kuan-Wei Chiu <visitorckw@...il.com>
> ---
> 
> Changes in v2:
> - Change the loop in fns() by counting down from n to 0.
> - Add find_bit benchmark result for find_nth_bit in commit message.
> 
>  include/linux/bitops.h | 12 +++---------
>  1 file changed, 3 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/bitops.h b/include/linux/bitops.h
> index 2ba557e067fe..57ecef354f47 100644
> --- a/include/linux/bitops.h
> +++ b/include/linux/bitops.h
> @@ -254,16 +254,10 @@ static inline unsigned long __ffs64(u64 word)
>   */
>  static inline unsigned long fns(unsigned long word, unsigned int n)
>  {
> -	unsigned int bit;
> +	while (word && n--)
> +		word &= word - 1;
>  
> -	while (word) {
> -		bit = __ffs(word);
> -		if (n-- == 0)
> -			return bit;
> -		__clear_bit(bit, &word);
> -	}
> -
> -	return BITS_PER_LONG;
> +	return word ? __ffs(word) : BITS_PER_LONG;
>  }
>  
>  /**
> -- 
> 2.34.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ