lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250205163012.GB1474@sol.localdomain>
Date: Wed, 5 Feb 2025 08:30:12 -0800
From: Eric Biggers <ebiggers@...nel.org>
To: Zhihang Shao <zhihang.shao.iscas@...il.com>
Cc: linux-crypto@...r.kernel.org, linux-kernel@...r.kernel.org,
	linux-riscv@...ts.infradead.org, ardb@...nel.org
Subject: Re: [PATCH v3] riscv: Optimize crct10dif with zbc extension

On Wed, Feb 05, 2025 at 02:58:15PM +0800, Zhihang Shao wrote:
> The current CRC-T10DIF algorithm on RISC-V platform is based on
> table-lookup optimization.
> Given the previous work on optimizing crc32 calculations with zbc
> extension, it is believed that this will be equally effective for
> accelerating crc-t10dif.
> 
> Therefore this patch offers an implementation of crc-t10dif using zbc
> extension. This can detect whether the current runtime environment
> supports zbc feature and, if so, uses it to accelerate crc-t10dif
> calculations.
> 
> This patch is updated due to the patchset of updating kernel's
> CRC-T10DIF library in 6.14, which is finished by Eric Biggers.
> Also, I used crc_kunit.c to test the performance of crc-t10dif optimized
> by crc extension.
> 
> Signed-off-by: Zhihang Shao <zhihang.shao.iscas@...il.com>
> ---
>  arch/riscv/Kconfig                |   1 +
>  arch/riscv/lib/Makefile           |   1 +
>  arch/riscv/lib/crc-t10dif-riscv.c | 132 ++++++++++++++++++++++++++++++
>  3 files changed, 134 insertions(+)
>  create mode 100644 arch/riscv/lib/crc-t10dif-riscv.c

Acked-by: Eric Biggers <ebiggers@...nel.org>
Tested-by: Eric Biggers <ebiggers@...nel.org>

This can go through the riscv tree.

Some minor comments below.

> +static inline u64 crct10dif_prep(u16 crc, unsigned long const *ptr)
> +{
> +	return ((u64)crc << 48) ^ (__force u64)__cpu_to_be64(*ptr);
> +}
> +
> +#elif __riscv_xlen == 32
> +#define STEP_ORDER 2
> +#define CRCT10DIF_POLY_QT_BE 0xf65a57f8
> +
> +static inline u32 crct10dif_prep(u16 crc, unsigned long const *ptr)
> +{
> +	return ((u32)crc << 16) ^ (__force u32)__cpu_to_be32(*ptr);
> +}

Maybe use 'const __be64 *' and 'const __be32 *' for the pointer, and use
be64_to_cpu() and be32_to_cpu().  Then the __force cast won't be needed.

> +static inline u16 crct10dif_zbc(unsigned long s)
> +{
> +	u16 crc;
> +
> +	asm volatile   (".option push\n"
> +			".option arch,+zbc\n"
> +			"clmulh %0, %1, %2\n"
> +			"xor    %0, %0, %1\n"
> +			"clmul  %0, %0, %3\n"
> +			".option pop\n"
> +			: "=&r" (crc)
> +			: "r"(s),
> +			  "r"(CRCT10DIF_POLY_QT_BE),
> +			  "r"(CRCT10DIF_POLY)
> +			:);
> +
> +	return crc;
> +}

A comment mentioning that this is using Barrett reduction would be helpful.

BTW, this is fine for an initial implementation, but eventually we'll probably
want to make it fold multiple words at a time to take advantage of
instruction-level parallelism, like the x86 PCLMULQDQ code does.  We also might
be able to share code among all the CRC variants like what I'm doing for x86.

> +#define STEP (1 << STEP_ORDER)
> +#define OFFSET_MASK (STEP - 1)

You can just remove the above #defines and use 'sizeof(unsigned long)' directly.
You can even use the % and / operators since the compiler optimizes them.
arch/x86/lib/crc32-glue.c does this.

> +	p = (unsigned char const *)p_ul;

Please use 'const u8 *'.

Thanks!

- Eric

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ