[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aI00nzzma4gXrmh/@x1>
Date: Fri, 1 Aug 2025 14:41:51 -0700
From: Drew Fustini <fustini@...nel.org>
To: Palmer Dabbelt <palmer@...belt.com>
Cc: rkrcmar@...tanamicro.com, Bjorn Topel <bjorn@...osinc.com>,
Alexandre Ghiti <alex@...ti.fr>,
Paul Walmsley <paul.walmsley@...ive.com>, samuel.holland@...ive.com,
dfustini@...storrent.com, andybnac@...il.com,
Conor Dooley <conor.dooley@...rochip.com>,
linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org,
linux-riscv-bounces@...ts.infradead.org
Subject: Re: [PATCH] riscv: Add sysctl to control discard of vstate during
syscall
On Wed, Jul 30, 2025 at 06:05:59PM -0700, Palmer Dabbelt wrote:
> My first guess here would be that trashing the V register state is still
> faster on the machines that triggered this patch, it's just that the way
> we're trashing it is slow. We're doing some wacky things in there (VILL,
> LMUL, clearing to -1), so it's not surprising that some implementations are
> slow on these routines.
>
> This came up during the original patch and we decided to just go with this
> way (which is recommended by the ISA) until someone could demonstrate it's
> slow, so sounds like it's time to go revisit those.
>
> So I'd start with something like
>
> diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
> index b61786d43c20..1fba33e62d2b 100644
> --- a/arch/riscv/include/asm/vector.h
> +++ b/arch/riscv/include/asm/vector.h
> @@ -287,7 +287,6 @@ static inline void __riscv_v_vstate_discard(void)
> "vmv.v.i v8, -1\n\t"
> "vmv.v.i v16, -1\n\t"
> "vmv.v.i v24, -1\n\t"
> - "vsetvl %0, x0, %1\n\t"
> ".option pop\n\t"
> : "=&r" (vl) : "r" (vtype_inval));
>
> to try and see if we're tripping over bad implementation behavior, in which
> case we can just hide this all in the kernel. Then we can split out these
> performance issues from other things like lazy save/restore and a
> V-preserving uABI, as it stands this is all sort of getting mixed up.
Thank you for your insights and the suggestion of removing vsetvl.
Using our v6.16-rc1 branch [1], the avg duration of getppid() is 198 ns
with the existing upstream behavior in __riscv_v_vstate_discard():
debian@...blackhole:~$ ./null_syscall --vsetvli
vsetvli complete
iterations: 1000000000
duration: 198 seconds
avg latency: 198.10 ns
I removed 'vsetvl' as you suggested but the average duration only
decreased a very small amount to 197.5 ns, so it seems that the other
instructions are what is taking a lot of time on the X280 cores:
debian@...blackhole:~$ ./null_syscall --vsetvli
vsetvli complete
iterations: 1000000000
duration: 197 seconds
avg latency: 197.53 ns
This is compared to a duration of 150 ns when using this patch with
abi.riscv_v_vstate_discard=0 which skips all the clobbering assembly.
Do you have any other suggestions for the __riscv_v_vstate_discard()
inline assembly that might be worth me testing on the X280 cores?
Thanks,
Drew
[1] https://github.com/tenstorrent/linux/tree/tt-blackhole-v6.16-rc1
Powered by blists - more mailing lists