[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aJJSpia1clSebQpy@x1>
Date: Tue, 5 Aug 2025 11:51:18 -0700
From: Drew Fustini <fustini@...nel.org>
To: Palmer Dabbelt <palmer@...belt.com>
Cc: rkrcmar@...tanamicro.com, Bjorn Topel <bjorn@...osinc.com>,
Alexandre Ghiti <alex@...ti.fr>,
Paul Walmsley <paul.walmsley@...ive.com>, samuel.holland@...ive.com,
dfustini@...storrent.com, andybnac@...il.com,
Conor Dooley <conor.dooley@...rochip.com>,
linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org,
linux-riscv-bounces@...ts.infradead.org
Subject: Re: [PATCH] riscv: Add sysctl to control discard of vstate during
syscall
On Fri, Aug 01, 2025 at 02:41:51PM -0700, Drew Fustini wrote:
> On Wed, Jul 30, 2025 at 06:05:59PM -0700, Palmer Dabbelt wrote:
> > My first guess here would be that trashing the V register state is still
> > faster on the machines that triggered this patch, it's just that the way
> > we're trashing it is slow. We're doing some wacky things in there (VILL,
> > LMUL, clearing to -1), so it's not surprising that some implementations are
> > slow on these routines.
> >
> > This came up during the original patch and we decided to just go with this
> > way (which is recommended by the ISA) until someone could demonstrate it's
> > slow, so sounds like it's time to go revisit those.
> >
> > So I'd start with something like
> >
> > diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
> > index b61786d43c20..1fba33e62d2b 100644
> > --- a/arch/riscv/include/asm/vector.h
> > +++ b/arch/riscv/include/asm/vector.h
> > @@ -287,7 +287,6 @@ static inline void __riscv_v_vstate_discard(void)
> > "vmv.v.i v8, -1\n\t"
> > "vmv.v.i v16, -1\n\t"
> > "vmv.v.i v24, -1\n\t"
> > - "vsetvl %0, x0, %1\n\t"
> > ".option pop\n\t"
> > : "=&r" (vl) : "r" (vtype_inval));
> >
> > to try and see if we're tripping over bad implementation behavior, in which
> > case we can just hide this all in the kernel. Then we can split out these
> > performance issues from other things like lazy save/restore and a
> > V-preserving uABI, as it stands this is all sort of getting mixed up.
>
> Thank you for your insights and the suggestion of removing vsetvl.
>
> Using our v6.16-rc1 branch [1], the avg duration of getppid() is 198 ns
> with the existing upstream behavior in __riscv_v_vstate_discard():
>
> debian@...blackhole:~$ ./null_syscall --vsetvli
> vsetvli complete
> iterations: 1000000000
> duration: 198 seconds
> avg latency: 198.10 ns
>
> I removed 'vsetvl' as you suggested but the average duration only
> decreased a very small amount to 197.5 ns, so it seems that the other
> instructions are what is taking a lot of time on the X280 cores:
>
> debian@...blackhole:~$ ./null_syscall --vsetvli
> vsetvli complete
> iterations: 1000000000
> duration: 197 seconds
> avg latency: 197.53 ns
>
> This is compared to a duration of 150 ns when using this patch with
> abi.riscv_v_vstate_discard=0 which skips all the clobbering assembly.
>
> Do you have any other suggestions for the __riscv_v_vstate_discard()
> inline assembly that might be worth me testing on the X280 cores?
I have tried leaving vsetvl but removing vmv.v.i instructions instead.
This made a difference on the X280 and reduced duration from 198 ns to
161 ns. This compared to an average duration of 148 ns when doing no
clobbering at all.
However, removing the vmv.v.i from the discard assembly doesn't help
much on our own out-of-order core due to still having to update the
vector state in status. Thus I'm still keen to have some way to entirely
opt out of __riscv_v_vstate_discard() on the do_trap_ecall_u() path.
Thanks,
Drew
Powered by blists - more mailing lists