linux-kernel - Re: [PATCH v4 02/12] riscv: Add vector extension XOR implementation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2289969.yH7iMFiWxO@basile.remlab.net>
Date:   Tue, 11 Jul 2023 20:33:50 +0300
From:   Rémi Denis-Courmont <remi@...lab.net>
To:     linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 02/12] riscv: Add vector extension XOR implementation

Le tiistaina 11. heinäkuuta 2023, 18.37.33 EEST Heiko Stuebner a écrit :
> diff --git a/arch/riscv/lib/xor.S b/arch/riscv/lib/xor.S
> new file mode 100644
> index 000000000000..3bc059e18171
> --- /dev/null
> +++ b/arch/riscv/lib/xor.S
> @@ -0,0 +1,81 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/*
> + * Copyright (C) 2021 SiFive
> + */
> +#include <linux/linkage.h>
> +#include <asm-generic/export.h>
> +#include <asm/asm.h>
> +
> +ENTRY(xor_regs_2_)
> +	vsetvli a3, a0, e8, m8, ta, ma

AFAICT, so far, Linux only uses `vsetvli` to save/restore/flush vectors, and 
that's, of course, with LMUL=8, so that's not really telling much anything. 
This function could be the first actual vector optimisation in kernel if/when 
it gets merged.

Should the same group multiplier be used for "actual" vector loops throughout 
the kernel? I've seen conflicting advises or opinions here. Should kernel code 
always use the maximum possible LMUL, depending on register pressure of the 
loop? Or will that just increase latency with no bandwidth gains compared to, 
say, LMUL=1 or LMUL=2?

> +	vle8.v v0, (a1)
> +	vle8.v v8, (a2)
> +	sub a0, a0, a3
> +	vxor.vv v16, v0, v8
> +	add a2, a2, a3
> +	vse8.v v16, (a1)
> +	add a1, a1, a3
> +	bnez a0, xor_regs_2_
> +	ret
> +END(xor_regs_2_)
> +EXPORT_SYMBOL(xor_regs_2_)
> +
> +ENTRY(xor_regs_3_)
> +	vsetvli a4, a0, e8, m8, ta, ma
> +	vle8.v v0, (a1)
> +	vle8.v v8, (a2)
> +	sub a0, a0, a4
> +	vxor.vv v0, v0, v8
> +	vle8.v v16, (a3)
> +	add a2, a2, a4
> +	vxor.vv v16, v0, v16
> +	add a3, a3, a4
> +	vse8.v v16, (a1)
> +	add a1, a1, a4
> +	bnez a0, xor_regs_3_
> +	ret
> +END(xor_regs_3_)
> +EXPORT_SYMBOL(xor_regs_3_)
> +
> +ENTRY(xor_regs_4_)
> +	vsetvli a5, a0, e8, m8, ta, ma
> +	vle8.v v0, (a1)
> +	vle8.v v8, (a2)
> +	sub a0, a0, a5
> +	vxor.vv v0, v0, v8
> +	vle8.v v16, (a3)
> +	add a2, a2, a5
> +	vxor.vv v0, v0, v16
> +	vle8.v v24, (a4)
> +	add a3, a3, a5
> +	vxor.vv v16, v0, v24
> +	add a4, a4, a5
> +	vse8.v v16, (a1)
> +	add a1, a1, a5
> +	bnez a0, xor_regs_4_
> +	ret
> +END(xor_regs_4_)
> +EXPORT_SYMBOL(xor_regs_4_)
> +
> +ENTRY(xor_regs_5_)
> +	vsetvli a6, a0, e8, m8, ta, ma
> +	vle8.v v0, (a1)
> +	vle8.v v8, (a2)
> +	sub a0, a0, a6
> +	vxor.vv v0, v0, v8
> +	vle8.v v16, (a3)
> +	add a2, a2, a6
> +	vxor.vv v0, v0, v16
> +	vle8.v v24, (a4)
> +	add a3, a3, a6
> +	vxor.vv v0, v0, v24
> +	vle8.v v8, (a5)
> +	add a4, a4, a6
> +	vxor.vv v16, v0, v8
> +	add a5, a5, a6
> +	vse8.v v16, (a1)
> +	add a1, a1, a6
> +	bnez a0, xor_regs_5_
> +	ret
> +END(xor_regs_5_)
> +EXPORT_SYMBOL(xor_regs_5_)


-- 
レミ・デニ-クールモン
http://www.remlab.net/