[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250806-riscv_v_vstate_discard-v2-1-6bfd61b2c23b@kernel.org>
Date: Wed, 06 Aug 2025 07:03:28 -0700
From: Drew Fustini <fustini@...nel.org>
To: Paul Walmsley <paul.walmsley@...ive.com>,
Palmer Dabbelt <palmer@...belt.com>, Alexandre Ghiti <alex@...ti.fr>,
Samuel Holland <samuel.holland@...ive.com>,
Björn Töpel <bjorn@...osinc.com>,
Andy Chiu <andybnac@...il.com>, Conor Dooley <conor.dooley@...rochip.com>
Cc: linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org,
Drew Fustini <dfustini@...storrent.com>, Drew Fustini <fustini@...nel.org>
Subject: [PATCH v2] riscv: Add sysctl to control discard of vstate during
syscall
From: Drew Fustini <dfustini@...storrent.com>
Clobbering the vector registers can significantly increase system call
latency for some implementations. To mitigate this performance impact, a
sysctl knob is provided that controls whether the vector state is
discarded in the syscall path:
/proc/sys/abi/riscv_v_vstate_discard
Valid values are:
0: Vector state is not always clobbered in all syscalls
1: Mandatory clobbering of vector state in all syscalls
The initial state is controlled by CONFIG_RISCV_ISA_V_VSTATE_DISCARD.
Fixes: 9657e9b7d253 ("riscv: Discard vector state on syscalls")
Signed-off-by: Drew Fustini <dfustini@...storrent.com>
---
Changes in v2:
- Reword the description of the abi.riscv_v_vstate_discard sysctl to
clarify that option '0' does not preserve the vector state - it just
means that vector state will not always be clobbered in the syscall
path.
- Add clarification suggested by Palmer in v1 to the "Vector Register
State Across System Calls" documentation section.
- v1: https://lore.kernel.org/linux-riscv/20250719033912.1313955-1-fustini@kernel.org/
Test results:
-------------
I've tested the impact of riscv_v_vstate_discard() on the SiFive X280
cores [1] in the Tenstorrent Blackhole SoC [2]. The results from the
Blackhole P100 [3] card show that discarding the vector registers
increases null syscall latency by 25%.
The null syscall program [4] executes vsetvli and then calls getppid()
in a loop. The average duration of getppid() is 198 ns when registers
are clobbered in riscv_v_vstate_discard(). The average duration drops
to 149 ns when riscv_v_vstate_discard() skips clobbering the registers
becaise riscv_v_vstate_discard is set to 0.
$ sudo sysctl abi.riscv_v_vstate_discard=1
abi.riscv_v_vstate_discard = 1
$ ./null_syscall --vsetvli
vsetvli complete
iterations: 1000000000
duration: 198 seconds
avg latency: 198.73 ns
$ sudo sysctl abi.riscv_v_vstate_discard=0
abi.riscv_v_vstate_discard = 0
$ ./null_syscall --vsetvli
vsetvli complete
iterations: 1000000000
duration: 149 seconds
avg latency: 149.89 ns
I'm testing on the tt-blackhole-v6.16-rc1_vstate_discard [5] branch that
has 13 patches, including this one, on top of v6.16-rc1. Most are simple
yaml patches for dt bindings along with dts files and a bespoke network
driver. I don't think the other patches are relevant to this discussion.
This patch applies clean on its own to riscv/for-next.
[1] https://www.sifive.com/cores/intelligence-x200-series
[2] https://tenstorrent.com/en/hardware/blackhole
[3] https://github.com/tenstorrent/tt-bh-linux
[4] https://gist.github.com/tt-fustini/ab9b217756912ce75522b3cce11d0d58
[5] https://github.com/tenstorrent/linux/tree/tt-blackhole-v6.16-rc1_vstate_discard
Signed-off-by: Drew Fustini <fustini@...nel.org>
---
Documentation/arch/riscv/vector.rst | 22 ++++++++++++++++++++--
arch/riscv/Kconfig | 10 ++++++++++
arch/riscv/include/asm/vector.h | 4 ++++
arch/riscv/kernel/vector.c | 16 +++++++++++++++-
4 files changed, 49 insertions(+), 3 deletions(-)
diff --git a/Documentation/arch/riscv/vector.rst b/Documentation/arch/riscv/vector.rst
index 3987f5f76a9deb0824e53a72df4c3bf90ac2bee1..b702c00351617165a4d8897c7df68eadcd2d562e 100644
--- a/Documentation/arch/riscv/vector.rst
+++ b/Documentation/arch/riscv/vector.rst
@@ -134,7 +134,25 @@ processes in form of sysctl knob:
3. Vector Register State Across System Calls
---------------------------------------------
-As indicated by version 1.0 of the V extension [1], vector registers are
-clobbered by system calls.
+Linux adopts the syscall ABI proposed by version 1.0 of the V extension [1],
+where vector registers are clobbered by system calls. Specifically:
+
+ Executing a system call causes all caller-saved vector registers
+ (v0-v31, vl, vtype) and vstart to become unspecied.
+
+However, clobbering the vector registers can significantly increase system call
+latency for some implementations. To mitigate this performance impact, a sysctl
+knob is provided that controls whether vector state is always discarded in the
+syscall path:
+
+* /proc/sys/abi/riscv_v_vstate_discard
+
+ Valid values are:
+
+ * 0: Vector state is not always clobbered in all syscalls
+ * 1: Mandatory clobbering of vector state in all syscalls
+
+ Reading this file returns the current discard behavior. The initial state is
+ controlled by CONFIG_RISCV_ISA_V_VSTATE_DISCARD.
1: https://github.com/riscv/riscv-v-spec/blob/master/calling-convention.adoc
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 36061f4732b7496a9c68a9a10f9959849dc2a95c..7bb8a8513135cbc105bd94d273012486a886f724 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -656,6 +656,16 @@ config RISCV_ISA_V_DEFAULT_ENABLE
If you don't know what to do here, say Y.
+config RISCV_ISA_V_VSTATE_DISCARD
+ bool "Enable Vector state discard by default"
+ depends on RISCV_ISA_V
+ default n
+ help
+ Say Y here if you want to always discard vector state in syscalls.
+ Otherwise, userspace has to enable it via the sysctl interface.
+
+ If you don't know what to do here, say N.
+
config RISCV_ISA_V_UCOPY_THRESHOLD
int "Threshold size for vectorized user copies"
depends on RISCV_ISA_V
diff --git a/arch/riscv/include/asm/vector.h b/arch/riscv/include/asm/vector.h
index 45c9b426fcc52321d55d1a4a42030c3b988e53c0..77991013216b9aea1744540caef38589338717ff 100644
--- a/arch/riscv/include/asm/vector.h
+++ b/arch/riscv/include/asm/vector.h
@@ -40,6 +40,7 @@
_res; \
})
+extern bool riscv_v_vstate_discard_ctl;
extern unsigned long riscv_v_vsize;
int riscv_v_setup_vsize(void);
bool insn_is_vector(u32 insn_buf);
@@ -270,6 +271,9 @@ static inline void __riscv_v_vstate_discard(void)
{
unsigned long vl, vtype_inval = 1UL << (BITS_PER_LONG - 1);
+ if (READ_ONCE(riscv_v_vstate_discard_ctl) == 0)
+ return;
+
riscv_v_enable();
if (has_xtheadvector())
asm volatile (THEAD_VSETVLI_T4X0E8M8D1 : : : "t4");
diff --git a/arch/riscv/kernel/vector.c b/arch/riscv/kernel/vector.c
index 901e67adf57608385e6815be1518e70216236eda..d81dcc86e794896dd36803d6e7540aad1dc37d79 100644
--- a/arch/riscv/kernel/vector.c
+++ b/arch/riscv/kernel/vector.c
@@ -26,6 +26,7 @@ static struct kmem_cache *riscv_v_user_cachep;
static struct kmem_cache *riscv_v_kernel_cachep;
#endif
+bool riscv_v_vstate_discard_ctl = IS_ENABLED(CONFIG_RISCV_ISA_V_VSTATE_DISCARD);
unsigned long riscv_v_vsize __read_mostly;
EXPORT_SYMBOL_GPL(riscv_v_vsize);
@@ -307,11 +308,24 @@ static const struct ctl_table riscv_v_default_vstate_table[] = {
},
};
+static const struct ctl_table riscv_v_vstate_discard_table[] = {
+ {
+ .procname = "riscv_v_vstate_discard",
+ .data = &riscv_v_vstate_discard_ctl,
+ .maxlen = sizeof(riscv_v_vstate_discard_ctl),
+ .mode = 0644,
+ .proc_handler = proc_dobool,
+ },
+};
+
static int __init riscv_v_sysctl_init(void)
{
- if (has_vector() || has_xtheadvector())
+ if (has_vector() || has_xtheadvector()) {
if (!register_sysctl("abi", riscv_v_default_vstate_table))
return -EINVAL;
+ if (!register_sysctl("abi", riscv_v_vstate_discard_table))
+ return -EINVAL;
+ }
return 0;
}
---
base-commit: fda589c286040d9ba2d72a0eaf0a13945fc48026
change-id: 20250805-riscv_v_vstate_discard-23ba1c1d1b68
Best regards,
--
Drew Fustini <fustini@...nel.org>
Powered by blists - more mailing lists