lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <267bf21f-a42d-4209-8348-e91d45c6e463@ghiti.fr>
Date: Wed, 23 Apr 2025 08:57:20 +0200
From: Alexandre Ghiti <alex@...ti.fr>
To: Kuan-Wei Chiu <visitorckw@...il.com>
Cc: paul.walmsley@...ive.com, palmer@...belt.com, aou@...s.berkeley.edu,
 jserv@...s.ncku.edu.tw, eleanor15x@...il.com,
 linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org,
 Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [RFC PATCH] riscv: Optimize gcd() performance by selecting
 CPU_NO_EFFICIENT_FFS

Hi Kuan-Wei,

On 04/04/2025 15:54, Kuan-Wei Chiu wrote:
> +Cc Andrew, since this might touch lib/math/gcd.c
>
> On Fri, Mar 28, 2025 at 03:07:36PM +0100, Alexandre Ghiti wrote:
>> Hi Kuan-Wei,
>>
>> First sorry for the late review.
>>
>> On 17/02/2025 02:37, Kuan-Wei Chiu wrote:
>>> When the Zbb extension is not supported, ffs() falls back to a software
>>> implementation instead of leveraging the hardware ctz instruction for
>>> fast computation. In such cases, selecting CPU_NO_EFFICIENT_FFS
>>> optimizes the efficiency of gcd().
>>>
>>> The implementation of gcd() depends on the CPU_NO_EFFICIENT_FFS option.
>>> With hardware support for ffs, the binary GCD algorithm is used.
>>> Without it, the odd-even GCD algorithm is employed for better
>>> performance.
>>>
>>> Co-developed-by: Yu-Chun Lin <eleanor15x@...il.com>
>>> Signed-off-by: Yu-Chun Lin <eleanor15x@...il.com>
>>> Signed-off-by: Kuan-Wei Chiu <visitorckw@...il.com>
>>> ---
>>> Although selecting NO_EFFICIENT_FFS seems reasonable without ctz
>>> instructions, this patch hasn't been tested on real hardware. We'd
>>> greatly appreciate it if someone could help test and provide
>>> performance numbers!
>>>
>>>    arch/riscv/Kconfig | 1 +
>>>    1 file changed, 1 insertion(+)
>>>
>>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
>>> index 7612c52e9b1e..2dd3699ad09b 100644
>>> --- a/arch/riscv/Kconfig
>>> +++ b/arch/riscv/Kconfig
>>> @@ -91,6 +91,7 @@ config RISCV
>>>    	select CLINT_TIMER if RISCV_M_MODE
>>>    	select CLONE_BACKWARDS
>>>    	select COMMON_CLK
>>> +	select CPU_NO_EFFICIENT_FFS if !RISCV_ISA_ZBB
>>>    	select CPU_PM if CPU_IDLE || HIBERNATION || SUSPEND
>>>    	select EDAC_SUPPORT
>>>    	select FRAME_POINTER if PERF_EVENTS || (FUNCTION_TRACER && !DYNAMIC_FTRACE)
>>
>> So your patch is correct. But a kernel built with RISCV_ISA_ZBB does not
>> mean the platform supports zbb and in that case, we'd still use the slow
>> version of gcd().
>>
>> Then I would use static keys instead, can you try to come up with a patch
>> that does that?
>>
> Here's my current thought: I'd like to add a static key named
> efficient_ffs_key in gcd.c, and possibly call
> static_branch_disable(&efficient_ffs_key) somewhere under arch/riscv/
> when RISCV_ISA_ZBB is enabled but the Zbb extension is not detected at
> runtime.
>
> However, I'm new to the RISC-V kernel code and not sure where would be
> the most appropriate place to insert that static_branch_disable() call.
> Suggestions are very welcome!


Sorry for the late answer, I missed your message.

So we put all of those initializations that depend on the discovery of 
extensions at the end of setup_arch() 
(https://elixir.bootlin.com/linux/v6.14.3/source/arch/riscv/kernel/setup.c#L334).

Thanks,

Alex


>
> Below is the diff for context.
>
> Regards,
> Kuan-Wei
>
> diff --git a/lib/math/gcd.c b/lib/math/gcd.c
> index e3b042214d1b..514b8a86b461 100644
> --- a/lib/math/gcd.c
> +++ b/lib/math/gcd.c
> @@ -2,6 +2,7 @@
>   #include <linux/kernel.h>
>   #include <linux/gcd.h>
>   #include <linux/export.h>
> +#include <linux/jump_label.h>
>
>   /*
>    * This implements the binary GCD algorithm. (Often attributed to Stein,
> @@ -11,6 +12,8 @@
>    * has decent hardware division.
>    */
>
> +DEFINE_STATIC_KEY_TRUE(efficient_ffs_key);
> +
>   #if !defined(CONFIG_CPU_NO_EFFICIENT_FFS)
>
>   /* If __ffs is available, the even/odd algorithm benchmarks slower. */
> @@ -20,7 +23,7 @@
>    * @a: first value
>    * @b: second value
>    */
> -unsigned long gcd(unsigned long a, unsigned long b)
> +static unsigned long gcd_binary(unsigned long a, unsigned long b)
>   {
>   	unsigned long r = a | b;
>
> @@ -44,7 +47,7 @@ unsigned long gcd(unsigned long a, unsigned long b)
>   	}
>   }
>
> -#else
> +#endif
>
>   /* If normalization is done by loops, the even/odd algorithm is a win. */
>   unsigned long gcd(unsigned long a, unsigned long b)
> @@ -54,6 +57,11 @@ unsigned long gcd(unsigned long a, unsigned long b)
>   	if (!a || !b)
>   		return r;
>
> +#if !defined(CONFIG_CPU_NO_EFFICIENT_FFS)
> +	if (static_branch_likely(&efficient_ffs_key))
> +		return binary_gcd(a, b);
> +#endif
> +
>   	/* Isolate lsbit of r */
>   	r &= -r;
>
> @@ -80,6 +88,4 @@ unsigned long gcd(unsigned long a, unsigned long b)
>   	}
>   }
>
> -#endif
> -
>   EXPORT_SYMBOL_GPL(gcd);

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ