[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250205163012.GB1474@sol.localdomain>
Date: Wed, 5 Feb 2025 08:30:12 -0800
From: Eric Biggers <ebiggers@...nel.org>
To: Zhihang Shao <zhihang.shao.iscas@...il.com>
Cc: linux-crypto@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-riscv@...ts.infradead.org, ardb@...nel.org
Subject: Re: [PATCH v3] riscv: Optimize crct10dif with zbc extension
On Wed, Feb 05, 2025 at 02:58:15PM +0800, Zhihang Shao wrote:
> The current CRC-T10DIF algorithm on RISC-V platform is based on
> table-lookup optimization.
> Given the previous work on optimizing crc32 calculations with zbc
> extension, it is believed that this will be equally effective for
> accelerating crc-t10dif.
>
> Therefore this patch offers an implementation of crc-t10dif using zbc
> extension. This can detect whether the current runtime environment
> supports zbc feature and, if so, uses it to accelerate crc-t10dif
> calculations.
>
> This patch is updated due to the patchset of updating kernel's
> CRC-T10DIF library in 6.14, which is finished by Eric Biggers.
> Also, I used crc_kunit.c to test the performance of crc-t10dif optimized
> by crc extension.
>
> Signed-off-by: Zhihang Shao <zhihang.shao.iscas@...il.com>
> ---
> arch/riscv/Kconfig | 1 +
> arch/riscv/lib/Makefile | 1 +
> arch/riscv/lib/crc-t10dif-riscv.c | 132 ++++++++++++++++++++++++++++++
> 3 files changed, 134 insertions(+)
> create mode 100644 arch/riscv/lib/crc-t10dif-riscv.c
Acked-by: Eric Biggers <ebiggers@...nel.org>
Tested-by: Eric Biggers <ebiggers@...nel.org>
This can go through the riscv tree.
Some minor comments below.
> +static inline u64 crct10dif_prep(u16 crc, unsigned long const *ptr)
> +{
> + return ((u64)crc << 48) ^ (__force u64)__cpu_to_be64(*ptr);
> +}
> +
> +#elif __riscv_xlen == 32
> +#define STEP_ORDER 2
> +#define CRCT10DIF_POLY_QT_BE 0xf65a57f8
> +
> +static inline u32 crct10dif_prep(u16 crc, unsigned long const *ptr)
> +{
> + return ((u32)crc << 16) ^ (__force u32)__cpu_to_be32(*ptr);
> +}
Maybe use 'const __be64 *' and 'const __be32 *' for the pointer, and use
be64_to_cpu() and be32_to_cpu(). Then the __force cast won't be needed.
> +static inline u16 crct10dif_zbc(unsigned long s)
> +{
> + u16 crc;
> +
> + asm volatile (".option push\n"
> + ".option arch,+zbc\n"
> + "clmulh %0, %1, %2\n"
> + "xor %0, %0, %1\n"
> + "clmul %0, %0, %3\n"
> + ".option pop\n"
> + : "=&r" (crc)
> + : "r"(s),
> + "r"(CRCT10DIF_POLY_QT_BE),
> + "r"(CRCT10DIF_POLY)
> + :);
> +
> + return crc;
> +}
A comment mentioning that this is using Barrett reduction would be helpful.
BTW, this is fine for an initial implementation, but eventually we'll probably
want to make it fold multiple words at a time to take advantage of
instruction-level parallelism, like the x86 PCLMULQDQ code does. We also might
be able to share code among all the CRC variants like what I'm doing for x86.
> +#define STEP (1 << STEP_ORDER)
> +#define OFFSET_MASK (STEP - 1)
You can just remove the above #defines and use 'sizeof(unsigned long)' directly.
You can even use the % and / operators since the compiler optimizes them.
arch/x86/lib/crc32-glue.c does this.
> + p = (unsigned char const *)p_ul;
Please use 'const u8 *'.
Thanks!
- Eric
Powered by blists - more mailing lists