linux-kernel - Re: [PATCH] riscv: Add sysctl to control discard of vstate during syscall

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aI00nzzma4gXrmh/@x1>
Date: Fri, 1 Aug 2025 14:41:51 -0700
From: Drew Fustini <fustini@...nel.org>
To: Palmer Dabbelt <palmer@...belt.com>
Cc: rkrcmar@...tanamicro.com, Bjorn Topel <bjorn@...osinc.com>,
	Alexandre Ghiti <alex@...ti.fr>,
	Paul Walmsley <paul.walmsley@...ive.com>, samuel.holland@...ive.com,
	dfustini@...storrent.com, andybnac@...il.com,
	Conor Dooley <conor.dooley@...rochip.com>,
	linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org,
	linux-riscv-bounces@...ts.infradead.org
Subject: Re: [PATCH] riscv: Add sysctl to control discard of vstate during
 syscall

On Wed, Jul 30, 2025 at 06:05:59PM -0700, Palmer Dabbelt wrote:
> My first guess here would be that trashing the V register state is still
> faster on the machines that triggered this patch, it's just that the way
> we're trashing it is slow.  We're doing some wacky things in there (VILL,
> LMUL, clearing to -1), so it's not surprising that some implementations are
> slow on these routines.
> 
> This came up during the original patch and we decided to just go with this
> way (which is recommended by the ISA) until someone could demonstrate it's
> slow, so sounds like it's time to go revisit those.
> 
> So I'd start with something like
> 
>    diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
>    index b61786d43c20..1fba33e62d2b 100644
>    --- a/arch/riscv/include/asm/vector.h
>    +++ b/arch/riscv/include/asm/vector.h
>    @@ -287,7 +287,6 @@ static inline void __riscv_v_vstate_discard(void)
>                    "vmv.v.i        v8, -1\n\t"
>                    "vmv.v.i        v16, -1\n\t"
>                    "vmv.v.i        v24, -1\n\t"
>    -               "vsetvl         %0, x0, %1\n\t"
>                    ".option pop\n\t"
>                    : "=&r" (vl) : "r" (vtype_inval));
> 
> to try and see if we're tripping over bad implementation behavior, in which
> case we can just hide this all in the kernel.  Then we can split out these
> performance issues from other things like lazy save/restore and a
> V-preserving uABI, as it stands this is all sort of getting mixed up.

Thank you for your insights and the suggestion of removing vsetvl.

Using our v6.16-rc1 branch [1], the avg duration of getppid() is 198 ns
with the existing upstream behavior in __riscv_v_vstate_discard():

debian@...blackhole:~$ ./null_syscall --vsetvli
vsetvli complete
 iterations: 1000000000
   duration: 198 seconds
avg latency: 198.10 ns

I removed 'vsetvl' as you suggested but the average duration only
decreased a very small amount to 197.5 ns, so it seems that the other
instructions are what is taking a lot of time on the X280 cores:

debian@...blackhole:~$ ./null_syscall --vsetvli
vsetvli complete
 iterations: 1000000000
   duration: 197 seconds
avg latency: 197.53 ns

This is compared to a duration of 150 ns when using this patch with
abi.riscv_v_vstate_discard=0 which skips all the clobbering assembly.

Do you have any other suggestions for the __riscv_v_vstate_discard()
inline assembly that might be worth me testing on the X280 cores?

Thanks,
Drew

[1] https://github.com/tenstorrent/linux/tree/tt-blackhole-v6.16-rc1