linux-kernel - Re: [PATCH V5] raid6: Add RISC-V SIMD syndrome and recovery calculations

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAAfSe-vD_37uihLjGwOqQKnyKJaJ36OwxDeocMOhK4s6-cpzAA@mail.gmail.com>
Date: Thu, 8 May 2025 15:14:50 +0800
From: Chunyan Zhang <zhang.lyra@...il.com>
To: Palmer Dabbelt <palmer@...belt.com>
Cc: zhangchunyan@...as.ac.cn, Paul Walmsley <paul.walmsley@...ive.com>, 
	aou@...s.berkeley.edu, Charlie Jenkins <charlie@...osinc.com>, song@...nel.org, 
	yukuai3@...wei.com, linux-riscv@...ts.infradead.org, 
	linux-raid@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH V5] raid6: Add RISC-V SIMD syndrome and recovery calculations

Hi Palmer,

On Mon, 31 Mar 2025 at 23:55, Palmer Dabbelt <palmer@...belt.com> wrote:
>
> On Wed, 05 Mar 2025 00:37:06 PST (-0800), zhangchunyan@...as.ac.cn wrote:
> > The assembly is originally based on the ARM NEON and int.uc, but uses
> > RISC-V vector instructions to implement the RAID6 syndrome and
> > recovery calculations.
> >
> > The functions are tested on QEMU running with the option "-icount shift=0":
>
> Does anyone have hardware benchmarks for this?  There's a lot more code
> here than the other targets have.  If all that unrolling is necessary for
> performance on real hardware then it seems fine to me, but just having
> it for QEMU doesn't really tell us much.

I made tests on Banana Pi BPI-F3 and Canaan K230.

BPI-F3 is designed with SpacemiT K1 8-core RISC-V chip, the test
result on BPI-F3 was:

  raid6: rvvx1    gen()  2916 MB/s
  raid6: rvvx2    gen()  2986 MB/s
  raid6: rvvx4    gen()  2975 MB/s
  raid6: rvvx8    gen()  2763 MB/s
  raid6: int64x8  gen()  1571 MB/s
  raid6: int64x4  gen()  1741 MB/s
  raid6: int64x2  gen()  1639 MB/s
  raid6: int64x1  gen()  1394 MB/s
  raid6: using algorithm rvvx2 gen() 2986 MB/s
  raid6: .... xor() 2 MB/s, rmw enabled
  raid6: using rvv recovery algorithm

The K230 uses the XuanTie C908 dual-core processor, with the larger
core C908 featuring the RVV1.0 extension, the test result on K230 was:

  raid6: rvvx1    gen()  1556 MB/s
  raid6: rvvx2    gen()  1576 MB/s
  raid6: rvvx4    gen()  1590 MB/s
  raid6: rvvx8    gen()  1491 MB/s
  raid6: int64x8  gen()  1142 MB/s
  raid6: int64x4  gen()  1628 MB/s
  raid6: int64x2  gen()  1651 MB/s
  raid6: int64x1  gen()  1391 MB/s
  raid6: using algorithm int64x2 gen() 1651 MB/s
  raid6: .... xor() 879 MB/s, rmw enabled
  raid6: using rvv recovery algorithm

We can see the fastest unrolling algorithm was rvvx2 on BPI-F3 and
rvvx4 on K230 compared with other rvv algorithms.

I have only these two RVV boards for now, so no more testing data on
more different systems, I'm not sure if rvv8 will be needed on some
hardware or some other system environments.

Thanks,
Chunyan