[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <DBLCYPBLQBSQ.170ND7Z93GPK4@ventanamicro.com>
Date: Fri, 25 Jul 2025 20:47:04 +0200
From: Radim Krčmář <rkrcmar@...tanamicro.com>
To: "Vivian Wang" <wangruikang@...as.ac.cn>, "Drew Fustini"
<fustini@...nel.org>, "Palmer Dabbelt" <palmer@...belt.com>,
Björn Töpel <bjorn@...osinc.com>, "Alexandre Ghiti"
<alex@...ti.fr>, "Paul Walmsley" <paul.walmsley@...ive.com>, "Samuel
Holland" <samuel.holland@...ive.com>, "Drew Fustini"
<dfustini@...storrent.com>, "Andy Chiu" <andybnac@...il.com>, "Conor
Dooley" <conor.dooley@...rochip.com>, <linux-riscv@...ts.infradead.org>,
<linux-kernel@...r.kernel.org>
Cc: "linux-riscv" <linux-riscv-bounces@...ts.infradead.org>
Subject: Re: [PATCH] riscv: Add sysctl to control discard of vstate during
syscall
2025-07-25T23:01:03+08:00, Vivian Wang <wangruikang@...as.ac.cn>:
> On 7/25/25 18:18, Radim Krčmář wrote:
>> 2025-07-24T05:55:54+08:00, Vivian Wang <wangruikang@...as.ac.cn>:
>>> On 7/19/25 11:39, Drew Fustini wrote:
>>>> From: Drew Fustini <dfustini@...storrent.com>
>>>> Clobbering the vector registers can significantly increase system call
>>>> latency for some implementations. To mitigate this performance impact, a
>>>> policy mechanism is provided to administrators, distro maintainers, and
>>>> developers to control vector state discard in the form of a sysctl knob:
>>> So I had an idea: Is it possible to avoid repeatedly discarding the
>>> state on every syscall by setting VS to Initial after discarding, and
>>> avoiding discarding when VS is Initial? So:
>>>
>>> if (VS == Clean || VS == Dirty) {
>>> clobber;
>>> VS = Initial;
>>> }
>>>
>>> This would avoid this problem with syscall-heavy user programs while
>>> adding minimum overhead for everything else.
>> I think your proposal improves the existing code, but if a userspace is
>> using vectors, it's likely also restoring them after a syscall, so the
>> state would immediately get dirty, and the next syscall would again
>> needlessly clobber vector registers.
>
> Without any data to back it up, I would say that my understanding is
> that this should be a rare case, only happening if e.g. someone is
> adding printf debugging to their vector code. Otherwise, vector loops
> should not have syscalls in them.
>
> A more reasonable worry would be programs using RVV everywhere in all
> sorts of common operations. In that case, alternating syscalls and
> vectors would make the discarding wasteful.
Good point. Yeah, auto-vectorization might be hindered.
In the worst case, users could just notice that it's slowing programs
down, and disable it without looking for the cause.
>> Preserving the vector state still seems better for userspaces that use
>> both vectors and syscalls.
>
> If we can expect e.g. userspace programs to primarily repeatedly use RVV
> with no syscalls between loops, *or* primarily repeatedly use syscalls
> with rare occurrences of RVV between syscalls. This way, the primarily
> syscall programs can benefit from slightly switching, since there's no
> need to save and restore state for those most of the time. In effect,
> syscalls serves as a hint that RVV is over.
This would need deeper analysis, and we will probably never be correct
with a system-wide policy regardless -- a room for prctl?
I think there might be a lot of programs that have a repeating pattern
of compute -> syscall (e.g. to write results), and clobbering is losing
performance if a program does more than a single loop per switch.
> The primarily RVV programs
> should not be switching as much - if they are, that's a sign of CPU
> resources being oversubscribed.
Yes, but clobbering only gives benefits on a switch, so we don't want to
clobber if there are more syscall than switches.
Well, there is a way: a syscall could just set VS=Initial, and if
userspace doesn't dirty vector registers, a restore would set the
registers to whatever the initial state is.
No vector registers touched on syscall, or save.
This works as we don't have to do anything when "clobbering" -- the
registers are unspecified after a syscall.
The downside is that users might (incorrectly) depend on the unspecified
value without dirtying, so the unspecified value could change at an
arbitrary point, which would provide some interesting debugging cases.
(And it's still suboptimal if software actually wants to preserve
vectors across syscalls.)
> Having said all of that, I am actually slightly more interested in why
> vmv.v.vi is *so slow* on SiFive X280. I wonder if there would be a more
> microarchitectural favorable ways to just put a bunch of ones in some
> vector registers? Would 0 be better?
No idea, and there are a lot of options to try, but it would be quite
sad if we had to have special case for each implementation.
Thanks.
Powered by blists - more mailing lists