[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aC-oelGP53vtpuJG@debug.ba.rivosinc.com>
Date: Thu, 22 May 2025 15:43:06 -0700
From: Deepak Gupta <debug@...osinc.com>
To: Andy Chiu <andybnac@...il.com>
Cc: Ben Dooks <ben.dooks@...ethink.co.uk>,
Cyril Bur <cyrilbur@...storrent.com>, palmer@...belt.com,
aou@...s.berkeley.edu, paul.walmsley@...ive.com,
charlie@...osinc.com, jrtc27@...c27.com, alex@...ti.fr,
linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org,
jszhang@...nel.org,
syzbot+e74b94fe601ab9552d69@...kaller.appspotmail.com
Subject: Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches
On Fri, May 23, 2025 at 01:42:49AM +0800, Andy Chiu wrote:
>On Thu, May 22, 2025 at 11:09 PM Deepak Gupta <debug@...osinc.com> wrote:
>>
>> On Thu, May 22, 2025 at 07:23:32AM +0100, Ben Dooks wrote:
>> >On 20/05/2025 17:49, Deepak Gupta wrote:
>> >>I did give this patch my RB and had planned to come back to it to see
>> >>if it impacts cfi related patches. Thanks to alex for brinigng to my
>> >>attention again. As it stands today, it doesn't impact cfi related
>> >>changes but I've some concerns.
>> >>
>> >>Overall I do agree we should reduce number of SSTATUS accesses.
>> >>
>> >>Couple of questions on introducing new `sstatus` field (inline)
>> >>
>> >>On Tue, Apr 22, 2025 at 04:01:35PM -0700, Deepak Gupta wrote:
>> >>>On Thu, Apr 10, 2025 at 07:05:22AM +0000, Cyril Bur wrote:
>> >>>>From: Ben Dooks <ben.dooks@...ethink.co.uk>
>> >>>>
>> >>>>When threads/tasks are switched we need to ensure the old execution's
>> >>>>SR_SUM state is saved and the new thread has the old SR_SUM state
>> >>>>restored.
>> >>>>
>> >>>>The issue was seen under heavy load especially with the syz-stress tool
>> >>>>running, with crashes as follows in schedule_tail:
>> >>>>
>> >>>>Unable to handle kernel access to user memory without uaccess routines
>> >>>>at virtual address 000000002749f0d0
>> >>>>Oops [#1]
>> >>>>Modules linked in:
>> >>>>CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
>> >>>>5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
>> >>>>Hardware name: riscv-virtio,qemu (DT)
>> >>>>epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>> >>>>ra : task_pid_vnr include/linux/sched.h:1421 [inline]
>> >>>>ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
>> >>>>epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
>> >>>>gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
>> >>>>t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
>> >>>>s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
>> >>>>a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
>> >>>>a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
>> >>>>s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
>> >>>>s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
>> >>>>s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
>> >>>>s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
>> >>>>t5 : ffffffc4043cafba t6 : 0000000000040000
>> >>>>status: 0000000000000120 badaddr: 000000002749f0d0 cause:
>> >>>>000000000000000f
>> >>>>Call Trace:
>> >>>>[<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
>> >>>>[<ffffffe000005570>] ret_from_exception+0x0/0x14
>> >>>>Dumping ftrace buffer:
>> >>>> (ftrace buffer empty)
>> >>>>---[ end trace b5f8f9231dc87dda ]---
>> >>>>
>> >>>>The issue comes from the put_user() in schedule_tail
>> >>>>(kernel/sched/core.c) doing the following:
>> >>>>
>> >>>>asmlinkage __visible void schedule_tail(struct task_struct *prev)
>> >>>>{
>> >>>>...
>> >>>> if (current->set_child_tid)
>> >>>> put_user(task_pid_vnr(current), current->set_child_tid);
>> >>>>...
>> >>>>}
>> >>>>
>> >>>>the put_user() macro causes the code sequence to come out as follows:
>> >>>>
>> >>>>1: __enable_user_access()
>> >>>>2: reg = task_pid_vnr(current);
>> >>>>3: *current->set_child_tid = reg;
>> >>>>4: __disable_user_access()
>> >>>>
>> >>>>The problem is that we may have a sleeping function as argument which
>> >>>>could clear SR_SUM causing the panic above. This was fixed by
>> >>>>evaluating the argument of the put_user() macro outside the user-enabled
>> >>>>section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg before
>> >>>>enabling user access")"
>> >>>>
>> >>>>In order for riscv to take advantage of unsafe_get/put_XXX() macros and
>> >>>>to avoid the same issue we had with put_user() and sleeping functions we
>> >>>>must ensure code flow can go through switch_to() from within a region of
>> >>>>code with SR_SUM enabled and come back with SR_SUM still enabled. This
>> >>>>patch addresses the problem allowing future work to enable full use of
>> >>>>unsafe_get/put_XXX() macros without needing to take a CSR bit flip cost
>> >>>>on every access. Make switch_to() save and restore SR_SUM.
>> >>>>
>> >>>>Reported-by: syzbot+e74b94fe601ab9552d69@...kaller.appspotmail.com
>> >>>>Signed-off-by: Ben Dooks <ben.dooks@...ethink.co.uk>
>> >>>>Signed-off-by: Cyril Bur <cyrilbur@...storrent.com>
>> >>>>---
>> >>>>arch/riscv/include/asm/processor.h | 1 +
>> >>>>arch/riscv/kernel/asm-offsets.c | 5 +++++
>> >>>>arch/riscv/kernel/entry.S | 8 ++++++++
>> >>>>3 files changed, 14 insertions(+)
>> >>>>
>> >>>>diff --git a/arch/riscv/include/asm/processor.h
>> >>>>b/arch/riscv/include/ asm/processor.h
>> >>>>index 5f56eb9d114a..58fd11c89fe9 100644
>> >>>>--- a/arch/riscv/include/asm/processor.h
>> >>>>+++ b/arch/riscv/include/asm/processor.h
>> >>>>@@ -103,6 +103,7 @@ struct thread_struct {
>> >>>> struct __riscv_d_ext_state fstate;
>> >>>> unsigned long bad_cause;
>> >>>> unsigned long envcfg;
>> >>>>+ unsigned long status;
>> >>
>> >>Do we really need a new member field in `thread_struct`. We already have
>> >>`sstatus` in `pt_regs` which reflects overall execution environment
>> >>situation
>> >>for current thread. This gets saved and restored on trap entry and exit.
>> >>
>> >>If we put `status` in `thread_struct` it creates ambiguity in terms
>> >>of which
>> >>`status` to save to and pick from from future maintainibility
>> >>purposes as the
>> >>fields get introduced to this CSR.
>> >>
>> >>Why can't we access current trap frame's `sstatus` image in
>> >>`__switch_to` to
>> >>save and restore?
>> >>
>> >>Let me know if I am missing something obvious here. If there is a
>> >>complication,
>> >>I am missing here and we do end up using this member field, I would
>> >>rename it
>> >>to something like `status_kernel` to reflect that. So that future
>> >>changes are
>> >>cognizant of the fact that we have split `status`. One for kernel
>> >>execution env
>> >>per thread and one for controlling user execution env per thread.
>> >
>> >This is so long ago now I cannot remember if there was any sstatus in
>> >the pt_regs field,
>>
>> FS/VS bits encode status of floating point and vector on per-thread basis.
>> So `status` has been part of `pt_regs` for quite a while.
>>
>> > and if kernel threads have the same context as their
>> >userland parts.
>>
>> I didn't mean kernel thread. What I meant was kernel execution environment
>> per-thread. A userland thread does spend sometime in kernel and kernel does
>> things on its behalf. One of those thing is touching user memory and that
>> requires mucking with this CSR. So what I meant was are we splitting `status`
>> on per-thread basis for their time spent in user and kernel.
>>
>> Getting back to original question--
>> As I said, each thread spends sometime in user or in kernel. `status` in
>> `pt_regs` is saved on trap entry and restored on trap exit. In a sense,
>> `status` field in `pt_regs` is reflecting execution status of the thread on per
>> trap basis. Introducing `status` in `thread_struct` creates a confusion (if not
>> for today, certainly for future) of which `status` to pick from when we are
>> doing save/restore.
>
>I agree that it's a confusion. sstatus is already saved on pt_regs on
>trap entries/return, adding another entry adds code complexity and
>makes data inconsistent. But, perhaps we'd eventually need something
>like this (I will explain why). Still, there might be a better
>approach.
>
>Yes, we can always reflect pt_regs for sstatus. We all know that
>pt_regs reflects sstatus at trap entry, and the pt_regs at scheduler
>point refers to "user's" pt_regs whenever it first enters kernel mode. Here
>are reasons why SR_SUM here may or may not be properly tracked. First,
>if this is a trap introduced context switch (such as interrupting in a
>preemptible context after we manually enable user access in put_user),
>then SR_SUM is saved somewhere in the kernel stack, and is not
>reference-able with task_pt_reg during context switch. But we are safe
>because the trap exit asm would help us restore the correct SR_SUM
>back. However, if this is a self-initiating context switch (calling
>into schedule()), then SR_SUM is not saved anywhere, and possibly
>causing this error.
>
>Preemptible Vector in the kernel mode also had this problem where a
>self-initiating context switch loses the track of sstatus.vs. The way
>I managed it is to track the VS bit at context switch time. However,
>this bug shows that people are repeatedly facing the problem, and
>maybe it suggests that we'd need a better way of managing sstatus
>across context switches. Given the complex nature of this register,
>which also touches the interrupt enable status, I don't think naively
>saving/restoring the entire register is the way to go. Maybe the
>variable deserves a more specific naming and documentation. And if
>we'd need a centralized place for managing these statuses, then it
>also has to take care of sstatus.VS.
IMHO, the problem we are trying to solve in this patch is easily solvable in
below manner.
diff --git a/arch/riscv/include/asm/switch_to.h b/arch/riscv/include/asm/switch_to.h
index 0e71eb82f920..499d00a6fb67 100644
--- a/arch/riscv/include/asm/switch_to.h
+++ b/arch/riscv/include/asm/switch_to.h
@@ -58,6 +58,20 @@ static inline void __switch_to_fpu(struct task_struct *prev,
fstate_restore(next, task_pt_regs(next));
}
+static inline void __switch_to_status(struct task_struct *prev,
+ struct task_struct *next)
+{
+ struct pt_regs *regs;
+
+ /* save status */
+ regs = task_pt_regs(prev);
+ regs->status = csr_read(CSR_STATUS);
+
+ /* restore status */
+ regs = task_pt_regs(next);
+ csr_write(CSR_STATUS, regs->status);
+}
+
static __always_inline bool has_fpu(void)
{
return riscv_has_extension_likely(RISCV_ISA_EXT_f) ||
@@ -115,6 +129,7 @@ do { \
struct task_struct *__prev = (prev); \
struct task_struct *__next = (next); \
__set_prev_cpu(__prev->thread); \
+ __switch_to_status(__prev, __next) \
if (has_fpu()) \
__switch_to_fpu(__prev, __next); \
if (has_vector() || has_xtheadvector()) \
diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
index 8d25837a9384..a3b98c1be055 100644
--- a/arch/riscv/kernel/entry.S
+++ b/arch/riscv/kernel/entry.S
@@ -162,17 +162,8 @@ SYM_CODE_START(handle_exception)
REG_S x5, PT_T0(sp)
save_from_x6_to_x31
- /*
- * Disable user-mode memory access as it should only be set in the
- * actual user copy routines.
- *
- * Disable the FPU/Vector to detect illegal usage of floating point
- * or vector in kernel space.
- */
- li t0, SR_SUM | SR_FS_VS | SR_ELP
-
REG_L s0, TASK_TI_USER_SP(tp)
- csrrc s1, CSR_STATUS, t0
+ csrr s1, CSR_STATUS
save_userssp s2, s1
csrr s2, CSR_EPC
csrr s3, CSR_TVAL
@@ -185,6 +176,16 @@ SYM_CODE_START(handle_exception)
REG_S s4, PT_CAUSE(sp)
REG_S s5, PT_TP(sp)
+ /*
+ * It is fresh trap entry. Disable user-mode memory access as it should only be set in the
+ * actual user copy routines.
+ *
+ * Disable the FPU/Vector to detect illegal usage of floating point
+ * or vector in kernel space.
+ */
+ li t0, SR_SUM | SR_FS_VS | SR_ELP
+ csrrc s1, CSR_STATUS, t0
+
/*
* Set the scratch register to 0, so that if a recursive exception
* occurs, the exception vector knows it came from the kernel
During the time spent in kernel if sets SUM bit in status then, above
`__switch_to_status` will ensure that `status` will get saved for current
thread and restored for next thread.
Furthermore, current trap entry code clears FS/VS/SUM (for right reasons). It
represents non-linear change of control flow and thus whatever will execute next
shouldn't need SUM/FS/VS unless it wants to set it). This patch slightly
modifies the flow by first saving the `status` on trap frame (thus if previous
trap frame had SUM=1, it will be saved and restored). And then it
unconditionally clears the SUM/FS/VS to ensure that this new trap context runs
without needing SUM=1. This ensures nesting of trap frames without diluting
security properties of SUM.
>
>Thanks,
>Andy
>
>
>
>
>>
>> So my first question was why not to use `status` in `pt_regs`. It is granular
>> as it can get (it is available per thread context per trap basis).
>>
>>
>> I did ask Alex as well. I'll ping him again.
>>
>> >
>> >Does anyone else have any comment on this?
>> >
>> >>
>> >>>> u32 riscv_v_flags;
>> >>>> u32 vstate_ctrl;
>> >>>> struct __riscv_v_ext_state vstate;
>> >>>>diff --git a/arch/riscv/kernel/asm-offsets.c
>> >>>>b/arch/riscv/kernel/asm- offsets.c
>> >>>>index 16490755304e..969c65b1fe41 100644
>> >>>>--- a/arch/riscv/kernel/asm-offsets.c
>> >>>>+++ b/arch/riscv/kernel/asm-offsets.c
>> >>>>@@ -34,6 +34,7 @@ void asm_offsets(void)
>> >>>> OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
>> >>>> OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
>> >>>> OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
>> >>
>> >>_______________________________________________
>> >>linux-riscv mailing list
>> >>linux-riscv@...ts.infradead.org
>> >>http://lists.infradead.org/mailman/listinfo/linux-riscv
>> >>
>> >
>> >
>> >--
>> >Ben Dooks http://www.codethink.co.uk/
>> >Senior Engineer Codethink - Providing Genius
>> >
>> >https://www.codethink.co.uk/privacy.html
>>
>> _______________________________________________
>> linux-riscv mailing list
>> linux-riscv@...ts.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-riscv
Powered by blists - more mailing lists