[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAAfSe-vD_37uihLjGwOqQKnyKJaJ36OwxDeocMOhK4s6-cpzAA@mail.gmail.com>
Date: Thu, 8 May 2025 15:14:50 +0800
From: Chunyan Zhang <zhang.lyra@...il.com>
To: Palmer Dabbelt <palmer@...belt.com>
Cc: zhangchunyan@...as.ac.cn, Paul Walmsley <paul.walmsley@...ive.com>,
aou@...s.berkeley.edu, Charlie Jenkins <charlie@...osinc.com>, song@...nel.org,
yukuai3@...wei.com, linux-riscv@...ts.infradead.org,
linux-raid@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH V5] raid6: Add RISC-V SIMD syndrome and recovery calculations
Hi Palmer,
On Mon, 31 Mar 2025 at 23:55, Palmer Dabbelt <palmer@...belt.com> wrote:
>
> On Wed, 05 Mar 2025 00:37:06 PST (-0800), zhangchunyan@...as.ac.cn wrote:
> > The assembly is originally based on the ARM NEON and int.uc, but uses
> > RISC-V vector instructions to implement the RAID6 syndrome and
> > recovery calculations.
> >
> > The functions are tested on QEMU running with the option "-icount shift=0":
>
> Does anyone have hardware benchmarks for this? There's a lot more code
> here than the other targets have. If all that unrolling is necessary for
> performance on real hardware then it seems fine to me, but just having
> it for QEMU doesn't really tell us much.
I made tests on Banana Pi BPI-F3 and Canaan K230.
BPI-F3 is designed with SpacemiT K1 8-core RISC-V chip, the test
result on BPI-F3 was:
raid6: rvvx1 gen() 2916 MB/s
raid6: rvvx2 gen() 2986 MB/s
raid6: rvvx4 gen() 2975 MB/s
raid6: rvvx8 gen() 2763 MB/s
raid6: int64x8 gen() 1571 MB/s
raid6: int64x4 gen() 1741 MB/s
raid6: int64x2 gen() 1639 MB/s
raid6: int64x1 gen() 1394 MB/s
raid6: using algorithm rvvx2 gen() 2986 MB/s
raid6: .... xor() 2 MB/s, rmw enabled
raid6: using rvv recovery algorithm
The K230 uses the XuanTie C908 dual-core processor, with the larger
core C908 featuring the RVV1.0 extension, the test result on K230 was:
raid6: rvvx1 gen() 1556 MB/s
raid6: rvvx2 gen() 1576 MB/s
raid6: rvvx4 gen() 1590 MB/s
raid6: rvvx8 gen() 1491 MB/s
raid6: int64x8 gen() 1142 MB/s
raid6: int64x4 gen() 1628 MB/s
raid6: int64x2 gen() 1651 MB/s
raid6: int64x1 gen() 1391 MB/s
raid6: using algorithm int64x2 gen() 1651 MB/s
raid6: .... xor() 879 MB/s, rmw enabled
raid6: using rvv recovery algorithm
We can see the fastest unrolling algorithm was rvvx2 on BPI-F3 and
rvvx4 on K230 compared with other rvv algorithms.
I have only these two RVV boards for now, so no more testing data on
more different systems, I'm not sure if rvv8 will be needed on some
hardware or some other system environments.
Thanks,
Chunyan
Powered by blists - more mailing lists