linux-kernel - Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches

Open Source and information security mailing list archives

Message-ID: <CAFTtA3MMx+fAk6DsV5tT9HkeW67zstp-Lrge9kZMW0Z60SVi0A@mail.gmail.com>
Date: Fri, 23 May 2025 01:42:49 +0800
From: Andy Chiu <andybnac@...il.com>
To: Deepak Gupta <debug@...osinc.com>
Cc: Ben Dooks <ben.dooks@...ethink.co.uk>, Cyril Bur <cyrilbur@...storrent.com>, 
	palmer@...belt.com, aou@...s.berkeley.edu, paul.walmsley@...ive.com, 
	charlie@...osinc.com, jrtc27@...c27.com, alex@...ti.fr, 
	linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org, 
	jszhang@...nel.org, syzbot+e74b94fe601ab9552d69@...kaller.appspotmail.com
Subject: Re: [PATCH v6 1/5] riscv: save the SR_SUM status over switches

On Thu, May 22, 2025 at 11:09 PM Deepak Gupta <debug@...osinc.com> wrote:
>
> On Thu, May 22, 2025 at 07:23:32AM +0100, Ben Dooks wrote:
> >On 20/05/2025 17:49, Deepak Gupta wrote:
> >>I did give this patch my RB and had planned to come back to it to see
> >>if it impacts cfi related patches. Thanks to alex for brinigng to my
> >>attention again. As it stands today, it doesn't impact cfi related
> >>changes but I've some concerns.
> >>
> >>Overall I do agree we should reduce number of SSTATUS accesses.
> >>
> >>Couple of questions on introducing new `sstatus` field (inline)
> >>
> >>On Tue, Apr 22, 2025 at 04:01:35PM -0700, Deepak Gupta wrote:
> >>>On Thu, Apr 10, 2025 at 07:05:22AM +0000, Cyril Bur wrote:
> >>>>From: Ben Dooks <ben.dooks@...ethink.co.uk>
> >>>>
> >>>>When threads/tasks are switched we need to ensure the old execution's
> >>>>SR_SUM state is saved and the new thread has the old SR_SUM state
> >>>>restored.
> >>>>
> >>>>The issue was seen under heavy load especially with the syz-stress tool
> >>>>running, with crashes as follows in schedule_tail:
> >>>>
> >>>>Unable to handle kernel access to user memory without uaccess routines
> >>>>at virtual address 000000002749f0d0
> >>>>Oops [#1]
> >>>>Modules linked in:
> >>>>CPU: 1 PID: 4875 Comm: syz-executor.0 Not tainted
> >>>>5.12.0-rc2-syzkaller-00467-g0d7588ab9ef9 #0
> >>>>Hardware name: riscv-virtio,qemu (DT)
> >>>>epc : schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
> >>>>ra : task_pid_vnr include/linux/sched.h:1421 [inline]
> >>>>ra : schedule_tail+0x70/0xb2 kernel/sched/core.c:4264
> >>>>epc : ffffffe00008c8b0 ra : ffffffe00008c8ae sp : ffffffe025d17ec0
> >>>>gp : ffffffe005d25378 tp : ffffffe00f0d0000 t0 : 0000000000000000
> >>>>t1 : 0000000000000001 t2 : 00000000000f4240 s0 : ffffffe025d17ee0
> >>>>s1 : 000000002749f0d0 a0 : 000000000000002a a1 : 0000000000000003
> >>>>a2 : 1ffffffc0cfac500 a3 : ffffffe0000c80cc a4 : 5ae9db91c19bbe00
> >>>>a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffe000082eba
> >>>>s2 : 0000000000040000 s3 : ffffffe00eef96c0 s4 : ffffffe022c77fe0
> >>>>s5 : 0000000000004000 s6 : ffffffe067d74e00 s7 : ffffffe067d74850
> >>>>s8 : ffffffe067d73e18 s9 : ffffffe067d74e00 s10: ffffffe00eef96e8
> >>>>s11: 000000ae6cdf8368 t3 : 5ae9db91c19bbe00 t4 : ffffffc4043cafb2
> >>>>t5 : ffffffc4043cafba t6 : 0000000000040000
> >>>>status: 0000000000000120 badaddr: 000000002749f0d0 cause:
> >>>>000000000000000f
> >>>>Call Trace:
> >>>>[<ffffffe00008c8b0>] schedule_tail+0x72/0xb2 kernel/sched/core.c:4264
> >>>>[<ffffffe000005570>] ret_from_exception+0x0/0x14
> >>>>Dumping ftrace buffer:
> >>>> (ftrace buffer empty)
> >>>>---[ end trace b5f8f9231dc87dda ]---
> >>>>
> >>>>The issue comes from the put_user() in schedule_tail
> >>>>(kernel/sched/core.c) doing the following:
> >>>>
> >>>>asmlinkage __visible void schedule_tail(struct task_struct *prev)
> >>>>{
> >>>>...
> >>>>      if (current->set_child_tid)
> >>>>              put_user(task_pid_vnr(current), current->set_child_tid);
> >>>>...
> >>>>}
> >>>>
> >>>>the put_user() macro causes the code sequence to come out as follows:
> >>>>
> >>>>1:    __enable_user_access()
> >>>>2:    reg = task_pid_vnr(current);
> >>>>3:    *current->set_child_tid = reg;
> >>>>4:    __disable_user_access()
> >>>>
> >>>>The problem is that we may have a sleeping function as argument which
> >>>>could clear SR_SUM causing the panic above. This was fixed by
> >>>>evaluating the argument of the put_user() macro outside the user-enabled
> >>>>section in commit 285a76bb2cf5 ("riscv: evaluate put_user() arg before
> >>>>enabling user access")"
> >>>>
> >>>>In order for riscv to take advantage of unsafe_get/put_XXX() macros and
> >>>>to avoid the same issue we had with put_user() and sleeping functions we
> >>>>must ensure code flow can go through switch_to() from within a region of
> >>>>code with SR_SUM enabled and come back with SR_SUM still enabled. This
> >>>>patch addresses the problem allowing future work to enable full use of
> >>>>unsafe_get/put_XXX() macros without needing to take a CSR bit flip cost
> >>>>on every access. Make switch_to() save and restore SR_SUM.
> >>>>
> >>>>Reported-by: syzbot+e74b94fe601ab9552d69@...kaller.appspotmail.com
> >>>>Signed-off-by: Ben Dooks <ben.dooks@...ethink.co.uk>
> >>>>Signed-off-by: Cyril Bur <cyrilbur@...storrent.com>
> >>>>---
> >>>>arch/riscv/include/asm/processor.h | 1 +
> >>>>arch/riscv/kernel/asm-offsets.c    | 5 +++++
> >>>>arch/riscv/kernel/entry.S          | 8 ++++++++
> >>>>3 files changed, 14 insertions(+)
> >>>>
> >>>>diff --git a/arch/riscv/include/asm/processor.h
> >>>>b/arch/riscv/include/ asm/processor.h
> >>>>index 5f56eb9d114a..58fd11c89fe9 100644
> >>>>--- a/arch/riscv/include/asm/processor.h
> >>>>+++ b/arch/riscv/include/asm/processor.h
> >>>>@@ -103,6 +103,7 @@ struct thread_struct {
> >>>>    struct __riscv_d_ext_state fstate;
> >>>>    unsigned long bad_cause;
> >>>>    unsigned long envcfg;
> >>>>+    unsigned long status;
> >>
> >>Do we really need a new member field in `thread_struct`. We already have
> >>`sstatus` in `pt_regs` which reflects overall execution environment
> >>situation
> >>for current thread. This gets saved and restored on trap entry and exit.
> >>
> >>If we put `status` in `thread_struct` it creates ambiguity in terms
> >>of which
> >>`status` to save to and pick from from future maintainibility
> >>purposes as the
> >>fields get introduced to this CSR.
> >>
> >>Why can't we access current trap frame's `sstatus` image in
> >>`__switch_to` to
> >>save and restore?
> >>
> >>Let me know if I am missing something obvious here. If there is a
> >>complication,
> >>I am missing here and we do end up using this member field, I would
> >>rename it
> >>to something like `status_kernel` to reflect that. So that future
> >>changes are
> >>cognizant of the fact that we have split `status`. One for kernel
> >>execution env
> >>per thread and one for controlling user execution env per thread.
> >
> >This is so long ago now I cannot remember if there was any sstatus in
> >the pt_regs field,
>
> FS/VS bits encode status of floating point and vector on per-thread basis.
> So `status` has been part of `pt_regs` for quite a while.
>
> > and if kernel threads have the same context as their
> >userland parts.
>
> I didn't mean kernel thread. What I meant was kernel execution environment
> per-thread. A userland thread does spend sometime in kernel and kernel does
> things on its behalf. One of those thing is touching user memory and that
> requires mucking with this CSR. So what I meant was are we splitting `status`
> on per-thread basis for their time spent in user and kernel.
>
> Getting back to original question--
> As I said, each thread spends sometime in user or in kernel. `status` in
> `pt_regs` is saved on trap entry and restored on trap exit. In a sense,
> `status` field in `pt_regs` is reflecting execution status of the thread on per
> trap basis. Introducing `status` in `thread_struct` creates a confusion (if not
> for today, certainly for future) of which `status` to pick from when we are
> doing save/restore.

I agree that it's a confusion. sstatus is already saved on pt_regs on
trap entries/return, adding another entry adds code complexity and
makes data inconsistent. But, perhaps we'd eventually need something
like this (I will explain why). Still, there might be a better
approach.

Yes, we can always reflect pt_regs for sstatus. We all know that
pt_regs reflects sstatus at trap entry, and the pt_regs at scheduler
point refers to "user's" pt_regs whenever it first enters kernel mode. Here
are reasons why SR_SUM here may or may not be properly tracked. First,
if this is a trap introduced context switch (such as interrupting in a
preemptible context after we manually enable user access in put_user),
then SR_SUM is saved somewhere in the kernel stack, and is not
reference-able with task_pt_reg during context switch. But we are safe
because the trap exit asm would help us restore the correct SR_SUM
back. However, if this is a self-initiating context switch (calling
into schedule()), then SR_SUM is not saved anywhere, and possibly
causing this error.

Preemptible Vector in the kernel mode also had this problem where a
self-initiating context switch loses the track of sstatus.vs. The way
I managed it is to track the VS bit at context switch time. However,
this bug shows that people are repeatedly facing the problem, and
maybe it suggests that we'd need a better way of managing sstatus
across context switches. Given the complex nature of this register,
which also touches the interrupt enable status, I don't think naively
saving/restoring the entire register is the way to go. Maybe the
variable deserves a more specific naming and documentation. And if
we'd need a centralized place for managing these statuses, then it
also has to take care of sstatus.VS.

Thanks,
Andy

>
> So my first question was why not to use `status` in `pt_regs`. It is granular
> as it can get (it is available per thread context per trap basis).
>
>
> I did ask Alex as well. I'll ping him again.
>
> >
> >Does anyone else have any comment on this?
> >
> >>
> >>>>    u32 riscv_v_flags;
> >>>>    u32 vstate_ctrl;
> >>>>    struct __riscv_v_ext_state vstate;
> >>>>diff --git a/arch/riscv/kernel/asm-offsets.c
> >>>>b/arch/riscv/kernel/asm- offsets.c
> >>>>index 16490755304e..969c65b1fe41 100644
> >>>>--- a/arch/riscv/kernel/asm-offsets.c
> >>>>+++ b/arch/riscv/kernel/asm-offsets.c
> >>>>@@ -34,6 +34,7 @@ void asm_offsets(void)
> >>>>    OFFSET(TASK_THREAD_S9, task_struct, thread.s[9]);
> >>>>    OFFSET(TASK_THREAD_S10, task_struct, thread.s[10]);
> >>>>    OFFSET(TASK_THREAD_S11, task_struct, thread.s[11]);
> >>
> >>_______________________________________________
> >>linux-riscv mailing list
> >>linux-riscv@...ts.infradead.org
> >>http://lists.infradead.org/mailman/listinfo/linux-riscv
> >>
> >
> >
> >--
> >Ben Dooks                              http://www.codethink.co.uk/
> >Senior Engineer                                Codethink - Providing Genius
> >
> >https://www.codethink.co.uk/privacy.html
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@...ts.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives