linux-kernel - Re: [PATCH v3 7/8] riscv: Add parameter for skipping access speed tests

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1b7e3d0f-0526-4afb-9f7a-2695e4166a9b@ghiti.fr>
Date: Mon, 17 Mar 2025 15:39:01 +0100
From: Alexandre Ghiti <alex@...ti.fr>
To: Andrew Jones <ajones@...tanamicro.com>, linux-riscv@...ts.infradead.org,
 linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org
Cc: paul.walmsley@...ive.com, palmer@...belt.com, charlie@...osinc.com,
 cleger@...osinc.com, Anup Patel <apatel@...tanamicro.com>, corbet@....net
Subject: Re: [PATCH v3 7/8] riscv: Add parameter for skipping access speed
 tests

Hi Drew,

On 04/03/2025 13:00, Andrew Jones wrote:
> Allow skipping scalar and vector unaligned access speed tests. This
> is useful for testing alternative code paths and to skip the tests in
> environments where they run too slowly. All CPUs must have the same
> unaligned access speed.

I'm not a big fan of the command line parameter, this is not where we 
should push uarch decisions because there could be many other in the 
future, the best solution to me should be in DT/ACPI and since the DT 
folks, according to Palmer, shut down this solution, it remains using an 
extension.

I have been reading a bit about unaligned accesses. Zicclsm was 
described as "Even though mandated, misaligned loads and stores might 
execute extremely slowly. Standard software distributions should assume 
their existence only for correctness, not for performance." in rva20/22 
but *not* in rva23. So what about using this "hole" and consider that a 
platform that *advertises* Zicclsm means its unaligned accesses are 
fast? After internal discussion, It actually does not make sense to 
advertise Zicclsm if the platform accesses are slow right?

arm64 for example considers that armv8 has fast unaligned accesses and 
can then enable HAVE_EFFICIENT_ALIGNED_ACCESS in the kernel, even though 
some uarchs are slow. Distros will very likely use rva23 as baseline so 
they will enable Zicclsm which would allow us to take advantage of this 
too, without this, we lose a lot of perf improvement in the kernel, see 
https://lore.kernel.org/lkml/20231225044207.3821-1-jszhang@kernel.org/.

Or we could have a new named feature for this, even though it's weird to 
have a named feature which would basically  mean "Zicclsm is fast". We 
don't have, for example, a named feature to say "Zicboz is fast" but 
given the vague wording in the profile spec, maybe we can ask for one in 
that case?

Sorry for the late review and for triggering this debate...

Thanks,

Alex


>
> The code movement is because we now need the scalar cpu hotplug
> callback to always run, so we need to bring it and its supporting
> functions out of CONFIG_RISCV_PROBE_UNALIGNED_ACCESS.
>
> Signed-off-by: Andrew Jones <ajones@...tanamicro.com>
> ---
>   arch/riscv/kernel/unaligned_access_speed.c | 187 +++++++++++++--------
>   1 file changed, 121 insertions(+), 66 deletions(-)
>
> diff --git a/arch/riscv/kernel/unaligned_access_speed.c b/arch/riscv/kernel/unaligned_access_speed.c
> index d9d4ca1fadc7..18e334549544 100644
> --- a/arch/riscv/kernel/unaligned_access_speed.c
> +++ b/arch/riscv/kernel/unaligned_access_speed.c
> @@ -24,8 +24,12 @@
>   DEFINE_PER_CPU(long, misaligned_access_speed) = RISCV_HWPROBE_MISALIGNED_SCALAR_UNKNOWN;
>   DEFINE_PER_CPU(long, vector_misaligned_access) = RISCV_HWPROBE_MISALIGNED_VECTOR_UNSUPPORTED;
>   
> -#ifdef CONFIG_RISCV_PROBE_UNALIGNED_ACCESS
> +static long unaligned_scalar_speed_param = RISCV_HWPROBE_MISALIGNED_SCALAR_UNKNOWN;
> +static long unaligned_vector_speed_param = RISCV_HWPROBE_MISALIGNED_VECTOR_UNKNOWN;
> +
>   static cpumask_t fast_misaligned_access;
> +
> +#ifdef CONFIG_RISCV_PROBE_UNALIGNED_ACCESS
>   static int check_unaligned_access(void *param)
>   {
>   	int cpu = smp_processor_id();
> @@ -130,6 +134,50 @@ static void __init check_unaligned_access_nonboot_cpu(void *param)
>   		check_unaligned_access(pages[cpu]);
>   }
>   
> +/* Measure unaligned access speed on all CPUs present at boot in parallel. */
> +static void __init check_unaligned_access_speed_all_cpus(void)
> +{
> +	unsigned int cpu;
> +	unsigned int cpu_count = num_possible_cpus();
> +	struct page **bufs = kcalloc(cpu_count, sizeof(*bufs), GFP_KERNEL);
> +
> +	if (!bufs) {
> +		pr_warn("Allocation failure, not measuring misaligned performance\n");
> +		return;
> +	}
> +
> +	/*
> +	 * Allocate separate buffers for each CPU so there's no fighting over
> +	 * cache lines.
> +	 */
> +	for_each_cpu(cpu, cpu_online_mask) {
> +		bufs[cpu] = alloc_pages(GFP_KERNEL, MISALIGNED_BUFFER_ORDER);
> +		if (!bufs[cpu]) {
> +			pr_warn("Allocation failure, not measuring misaligned performance\n");
> +			goto out;
> +		}
> +	}
> +
> +	/* Check everybody except 0, who stays behind to tend jiffies. */
> +	on_each_cpu(check_unaligned_access_nonboot_cpu, bufs, 1);
> +
> +	/* Check core 0. */
> +	smp_call_on_cpu(0, check_unaligned_access, bufs[0], true);
> +
> +out:
> +	for_each_cpu(cpu, cpu_online_mask) {
> +		if (bufs[cpu])
> +			__free_pages(bufs[cpu], MISALIGNED_BUFFER_ORDER);
> +	}
> +
> +	kfree(bufs);
> +}
> +#else /* CONFIG_RISCV_PROBE_UNALIGNED_ACCESS */
> +static void __init check_unaligned_access_speed_all_cpus(void)
> +{
> +}
> +#endif
> +
>   DEFINE_STATIC_KEY_FALSE(fast_unaligned_access_speed_key);
>   
>   static void modify_unaligned_access_branches(cpumask_t *mask, int weight)
> @@ -188,21 +236,29 @@ arch_initcall_sync(lock_and_set_unaligned_access_static_branch);
>   
>   static int riscv_online_cpu(unsigned int cpu)
>   {
> -	static struct page *buf;
> -
>   	/* We are already set since the last check */
> -	if (per_cpu(misaligned_access_speed, cpu) != RISCV_HWPROBE_MISALIGNED_SCALAR_UNKNOWN)
> +	if (per_cpu(misaligned_access_speed, cpu) != RISCV_HWPROBE_MISALIGNED_SCALAR_UNKNOWN) {
> +		goto exit;
> +	} else if (unaligned_scalar_speed_param != RISCV_HWPROBE_MISALIGNED_SCALAR_UNKNOWN) {
> +		per_cpu(misaligned_access_speed, cpu) = unaligned_scalar_speed_param;
>   		goto exit;
> -
> -	check_unaligned_access_emulated(NULL);
> -	buf = alloc_pages(GFP_KERNEL, MISALIGNED_BUFFER_ORDER);
> -	if (!buf) {
> -		pr_warn("Allocation failure, not measuring misaligned performance\n");
> -		return -ENOMEM;
>   	}
>   
> -	check_unaligned_access(buf);
> -	__free_pages(buf, MISALIGNED_BUFFER_ORDER);
> +#ifdef CONFIG_RISCV_PROBE_UNALIGNED_ACCESS
> +	{
> +		static struct page *buf;
> +
> +		check_unaligned_access_emulated(NULL);
> +		buf = alloc_pages(GFP_KERNEL, MISALIGNED_BUFFER_ORDER);
> +		if (!buf) {
> +			pr_warn("Allocation failure, not measuring misaligned performance\n");
> +			return -ENOMEM;
> +		}
> +
> +		check_unaligned_access(buf);
> +		__free_pages(buf, MISALIGNED_BUFFER_ORDER);
> +	}
> +#endif
>   
>   exit:
>   	set_unaligned_access_static_branches();
> @@ -217,50 +273,6 @@ static int riscv_offline_cpu(unsigned int cpu)
>   	return 0;
>   }
>   
> -/* Measure unaligned access speed on all CPUs present at boot in parallel. */
> -static void __init check_unaligned_access_speed_all_cpus(void)
> -{
> -	unsigned int cpu;
> -	unsigned int cpu_count = num_possible_cpus();
> -	struct page **bufs = kcalloc(cpu_count, sizeof(*bufs), GFP_KERNEL);
> -
> -	if (!bufs) {
> -		pr_warn("Allocation failure, not measuring misaligned performance\n");
> -		return;
> -	}
> -
> -	/*
> -	 * Allocate separate buffers for each CPU so there's no fighting over
> -	 * cache lines.
> -	 */
> -	for_each_cpu(cpu, cpu_online_mask) {
> -		bufs[cpu] = alloc_pages(GFP_KERNEL, MISALIGNED_BUFFER_ORDER);
> -		if (!bufs[cpu]) {
> -			pr_warn("Allocation failure, not measuring misaligned performance\n");
> -			goto out;
> -		}
> -	}
> -
> -	/* Check everybody except 0, who stays behind to tend jiffies. */
> -	on_each_cpu(check_unaligned_access_nonboot_cpu, bufs, 1);
> -
> -	/* Check core 0. */
> -	smp_call_on_cpu(0, check_unaligned_access, bufs[0], true);
> -
> -out:
> -	for_each_cpu(cpu, cpu_online_mask) {
> -		if (bufs[cpu])
> -			__free_pages(bufs[cpu], MISALIGNED_BUFFER_ORDER);
> -	}
> -
> -	kfree(bufs);
> -}
> -#else /* CONFIG_RISCV_PROBE_UNALIGNED_ACCESS */
> -static void __init check_unaligned_access_speed_all_cpus(void)
> -{
> -}
> -#endif
> -
>   #ifdef CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS
>   static void check_vector_unaligned_access(struct work_struct *work __always_unused)
>   {
> @@ -372,8 +384,8 @@ static int __init vec_check_unaligned_access_speed_all_cpus(void *unused __alway
>   
>   static int riscv_online_cpu_vec(unsigned int cpu)
>   {
> -	if (!has_vector()) {
> -		per_cpu(vector_misaligned_access, cpu) = RISCV_HWPROBE_MISALIGNED_VECTOR_UNSUPPORTED;
> +	if (unaligned_vector_speed_param != RISCV_HWPROBE_MISALIGNED_VECTOR_UNKNOWN) {
> +		per_cpu(vector_misaligned_access, cpu) = unaligned_vector_speed_param;
>   		return 0;
>   	}
>   
> @@ -388,30 +400,73 @@ static int riscv_online_cpu_vec(unsigned int cpu)
>   	return 0;
>   }
>   
> +static const char * const speed_str[] __initconst = { NULL, NULL, "slow", "fast", "unsupported" };
> +
> +static int __init set_unaligned_scalar_speed_param(char *str)
> +{
> +	if (!strcmp(str, speed_str[RISCV_HWPROBE_MISALIGNED_SCALAR_SLOW]))
> +		unaligned_scalar_speed_param = RISCV_HWPROBE_MISALIGNED_SCALAR_SLOW;
> +	else if (!strcmp(str, speed_str[RISCV_HWPROBE_MISALIGNED_SCALAR_FAST]))
> +		unaligned_scalar_speed_param = RISCV_HWPROBE_MISALIGNED_SCALAR_FAST;
> +	else if (!strcmp(str, speed_str[RISCV_HWPROBE_MISALIGNED_SCALAR_UNSUPPORTED]))
> +		unaligned_scalar_speed_param = RISCV_HWPROBE_MISALIGNED_SCALAR_UNSUPPORTED;
> +	else
> +		return -EINVAL;
> +
> +	return 1;
> +}
> +__setup("unaligned_scalar_speed=", set_unaligned_scalar_speed_param);
> +
> +static int __init set_unaligned_vector_speed_param(char *str)
> +{
> +	if (!strcmp(str, speed_str[RISCV_HWPROBE_MISALIGNED_VECTOR_SLOW]))
> +		unaligned_vector_speed_param = RISCV_HWPROBE_MISALIGNED_VECTOR_SLOW;
> +	else if (!strcmp(str, speed_str[RISCV_HWPROBE_MISALIGNED_VECTOR_FAST]))
> +		unaligned_vector_speed_param = RISCV_HWPROBE_MISALIGNED_VECTOR_FAST;
> +	else if (!strcmp(str, speed_str[RISCV_HWPROBE_MISALIGNED_VECTOR_UNSUPPORTED]))
> +		unaligned_vector_speed_param = RISCV_HWPROBE_MISALIGNED_VECTOR_UNSUPPORTED;
> +	else
> +		return -EINVAL;
> +
> +	return 1;
> +}
> +__setup("unaligned_vector_speed=", set_unaligned_vector_speed_param);
> +
>   static int __init check_unaligned_access_all_cpus(void)
>   {
>   	int cpu;
>   
> -	if (!check_unaligned_access_emulated_all_cpus())
> +	if (unaligned_scalar_speed_param == RISCV_HWPROBE_MISALIGNED_SCALAR_UNKNOWN &&
> +	    !check_unaligned_access_emulated_all_cpus()) {
>   		check_unaligned_access_speed_all_cpus();
> -
> -	if (!has_vector()) {
> +	} else {
> +		pr_info("scalar unaligned access speed set to '%s' by command line\n",
> +			speed_str[unaligned_scalar_speed_param]);
>   		for_each_online_cpu(cpu)
> -			per_cpu(vector_misaligned_access, cpu) = RISCV_HWPROBE_MISALIGNED_VECTOR_UNSUPPORTED;
> -	} else if (!check_vector_unaligned_access_emulated_all_cpus() &&
> -		   IS_ENABLED(CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS)) {
> +			per_cpu(misaligned_access_speed, cpu) = unaligned_scalar_speed_param;
> +	}
> +
> +	if (!has_vector())
> +		unaligned_vector_speed_param = RISCV_HWPROBE_MISALIGNED_VECTOR_UNSUPPORTED;
> +
> +	if (unaligned_vector_speed_param == RISCV_HWPROBE_MISALIGNED_VECTOR_UNKNOWN &&
> +	    !check_vector_unaligned_access_emulated_all_cpus() &&
> +	    IS_ENABLED(CONFIG_RISCV_PROBE_VECTOR_UNALIGNED_ACCESS)) {
>   		kthread_run(vec_check_unaligned_access_speed_all_cpus,
>   			    NULL, "vec_check_unaligned_access_speed_all_cpus");
> +	} else {
> +		pr_info("vector unaligned access speed set to '%s' by command line\n",
> +			speed_str[unaligned_vector_speed_param]);
> +		for_each_online_cpu(cpu)
> +			per_cpu(vector_misaligned_access, cpu) = unaligned_vector_speed_param;
>   	}
>   
>   	/*
>   	 * Setup hotplug callbacks for any new CPUs that come online or go
>   	 * offline.
>   	 */
> -#ifdef CONFIG_RISCV_PROBE_UNALIGNED_ACCESS
>   	cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "riscv:online",
>   				  riscv_online_cpu, riscv_offline_cpu);
> -#endif
>   	cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "riscv:online",
>   				  riscv_online_cpu_vec, NULL);
>