[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aIUgTTOcuNAJqZvg@x1>
Date: Sat, 26 Jul 2025 11:37:01 -0700
From: Drew Fustini <fustini@...nel.org>
To: Radim Krčmář <rkrcmar@...tanamicro.com>
Cc: Vivian Wang <wangruikang@...as.ac.cn>,
Palmer Dabbelt <palmer@...belt.com>,
Björn Töpel <bjorn@...osinc.com>,
Alexandre Ghiti <alex@...ti.fr>,
Paul Walmsley <paul.walmsley@...ive.com>,
Samuel Holland <samuel.holland@...ive.com>,
Drew Fustini <dfustini@...storrent.com>,
Andy Chiu <andybnac@...il.com>,
Conor Dooley <conor.dooley@...rochip.com>,
linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org,
linux-riscv <linux-riscv-bounces@...ts.infradead.org>
Subject: Re: [PATCH] riscv: Add sysctl to control discard of vstate during
syscall
On Fri, Jul 25, 2025 at 08:47:04PM +0200, Radim Krčmář wrote:
> 2025-07-25T23:01:03+08:00, Vivian Wang <wangruikang@...as.ac.cn>:
> > On 7/25/25 18:18, Radim Krčmář wrote:
> >> 2025-07-24T05:55:54+08:00, Vivian Wang <wangruikang@...as.ac.cn>:
> >>> On 7/19/25 11:39, Drew Fustini wrote:
> >>>> From: Drew Fustini <dfustini@...storrent.com>
> >>>> Clobbering the vector registers can significantly increase system call
> >>>> latency for some implementations. To mitigate this performance impact, a
> >>>> policy mechanism is provided to administrators, distro maintainers, and
> >>>> developers to control vector state discard in the form of a sysctl knob:
> >>> So I had an idea: Is it possible to avoid repeatedly discarding the
> >>> state on every syscall by setting VS to Initial after discarding, and
> >>> avoiding discarding when VS is Initial? So:
> >>>
> >>> if (VS == Clean || VS == Dirty) {
> >>> clobber;
> >>> VS = Initial;
> >>> }
> >>>
> >>> This would avoid this problem with syscall-heavy user programs while
> >>> adding minimum overhead for everything else.
> >> I think your proposal improves the existing code, but if a userspace is
> >> using vectors, it's likely also restoring them after a syscall, so the
> >> state would immediately get dirty, and the next syscall would again
> >> needlessly clobber vector registers.
> >
> > Without any data to back it up, I would say that my understanding is
> > that this should be a rare case, only happening if e.g. someone is
> > adding printf debugging to their vector code. Otherwise, vector loops
> > should not have syscalls in them.
> >
> > A more reasonable worry would be programs using RVV everywhere in all
> > sorts of common operations. In that case, alternating syscalls and
> > vectors would make the discarding wasteful.
>
> Good point. Yeah, auto-vectorization might be hindered.
Yes, I think that userspace vector usage will become more common over
time even for "ordinary" programs as compilers and libraries improve.
For example, it may be the case that the majority of userspace binaries
will use vector once the ifunc memcpy patches go in.
> In the worst case, users could just notice that it's slowing programs
> down, and disable it without looking for the cause.
I think that a default policy of not clobbering in syscalls would be the
best trade off. I gave CONFIG_RISCV_ISA_V_VSTATE_DISCARD a default of n
in this patch, and I imagined that people like Palmer, who wanted it for
test suites, could change the default or use the sysctl.
>
> >> Preserving the vector state still seems better for userspaces that use
> >> both vectors and syscalls.
> >
> > If we can expect e.g. userspace programs to primarily repeatedly use RVV
> > with no syscalls between loops, *or* primarily repeatedly use syscalls
> > with rare occurrences of RVV between syscalls. This way, the primarily
> > syscall programs can benefit from slightly switching, since there's no
> > need to save and restore state for those most of the time. In effect,
> > syscalls serves as a hint that RVV is over.
>
> This would need deeper analysis, and we will probably never be correct
> with a system-wide policy regardless -- a room for prctl?
>
> I think there might be a lot of programs that have a repeating pattern
> of compute -> syscall (e.g. to write results), and clobbering is losing
> performance if a program does more than a single loop per switch.
It's interesting that you mention prctl as it does seem like that could
play a role here. If people think that one syscall clobbering behavior
for the whole system is too limited, then maybe prctl could be a better
solution. I believe it should default to not clobbering. It could be
enabled for test suites in CI that want the strict clobbering, or for
programs that are known to work better with clobbering enabled.
Thanks,
Drew
Powered by blists - more mailing lists