linux-kernel - Re: [PATCH v4] riscv: fix race when vmap stack overflow

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJF2gTQ0xuJo6uzB+8SudZOFiZ2_o1sLB=Hn5XuCw6g2tXUtkQ@mail.gmail.com>
Date:   Wed, 30 Nov 2022 15:15:40 +0800
From:   Guo Ren <guoren@...nel.org>
To:     Palmer Dabbelt <palmer@...osinc.com>
Cc:     jszhang@...nel.org, linux-riscv@...ts.infradead.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4] riscv: fix race when vmap stack overflow

The comment becomes better. Thx.

On Wed, Nov 30, 2022 at 10:29 AM Palmer Dabbelt <palmer@...osinc.com> wrote:
>
> From: Jisheng Zhang <jszhang@...nel.org>
>
> Currently, when detecting vmap stack overflow, riscv firstly switches
> to the so called shadow stack, then use this shadow stack to call the
> get_overflow_stack() to get the overflow stack. However, there's
> a race here if two or more harts use the same shadow stack at the same
> time.
>
> To solve this race, we introduce spin_shadow_stack atomic var, which
> will be swap between its own address and 0 in atomic way, when the
> var is set, it means the shadow_stack is being used; when the var
> is cleared, it means the shadow_stack isn't being used.
>
> Fixes: 31da94c25aea ("riscv: add VMAP_STACK overflow detection")
> Signed-off-by: Jisheng Zhang <jszhang@...nel.org>
> Suggested-by: Guo Ren <guoren@...nel.org>
> Reviewed-by: Guo Ren <guoren@...nel.org>
> Link: https://lore.kernel.org/r/20221030124517.2370-1-jszhang@kernel.org
> [Palmer: Add AQ to the swap, and also some comments.]
> Signed-off-by: Palmer Dabbelt <palmer@...osinc.com>
> ---
> Sorry to just re-spin this one without any warning, but I'd read patch a
> few times and every time I'd managed to convice myself there was a much
> simpler way of doing this.  By the time I'd figured out why that's not
> the case it seemed faster to just write the comments.
>
> I've stashed this, right on top of the offending commit, at
> palmer/riscv-fix_vmap_stack.
>
> Since v3:
>  - Add AQ to the swap.
>  - Add a bunch of comments.
>
> Since v2:
>  - use REG_AMOSWAP
>  - add comment to the purpose of smp_store_release()
>
> Since v1:
>  - use smp_store_release directly
>  - use unsigned int instead of atomic_t
> ---
>  arch/riscv/include/asm/asm.h |  1 +
>  arch/riscv/kernel/entry.S    | 13 +++++++++++++
>  arch/riscv/kernel/traps.c    | 18 ++++++++++++++++++
>  3 files changed, 32 insertions(+)
>
> diff --git a/arch/riscv/include/asm/asm.h b/arch/riscv/include/asm/asm.h
> index 618d7c5af1a2..e15a1c9f1cf8 100644
> --- a/arch/riscv/include/asm/asm.h
> +++ b/arch/riscv/include/asm/asm.h
> @@ -23,6 +23,7 @@
>  #define REG_L          __REG_SEL(ld, lw)
>  #define REG_S          __REG_SEL(sd, sw)
>  #define REG_SC         __REG_SEL(sc.d, sc.w)
> +#define REG_AMOSWAP_AQ __REG_SEL(amoswap.d.aq, amoswap.w.aq)
Below is the reason why I use the relax version here:
https://lore.kernel.org/all/CAJF2gTRAEX_jQ_w5H05dyafZzHq+P5j05TJ=C+v+OL__GQam4A@mail.gmail.com/T/#u

>  #define REG_ASM                __REG_SEL(.dword, .word)
>  #define SZREG          __REG_SEL(8, 4)
>  #define LGREG          __REG_SEL(3, 2)
> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> index 98f502654edd..5fdb6ba09600 100644
> --- a/arch/riscv/kernel/entry.S
> +++ b/arch/riscv/kernel/entry.S
> @@ -387,6 +387,19 @@ handle_syscall_trace_exit:
>
>  #ifdef CONFIG_VMAP_STACK
>  handle_kernel_stack_overflow:
> +       /*
> +        * Takes the psuedo-spinlock for the shadow stack, in case multiple
> +        * harts are concurrently overflowing their kernel stacks.  We could
> +        * store any value here, but since we're overflowing the kernel stack
> +        * already we only have SP to use as a scratch register.  So we just
> +        * swap in the address of the spinlock, as that's definately non-zero.
> +        *
> +        * Pairs with a store_release in handle_bad_stack().
> +        */
> +1:     la sp, spin_shadow_stack
> +       REG_AMOSWAP_AQ sp, sp, (sp)
> +       bnez sp, 1b
> +
>         la sp, shadow_stack
>         addi sp, sp, SHADOW_OVERFLOW_STACK_SIZE
>
> diff --git a/arch/riscv/kernel/traps.c b/arch/riscv/kernel/traps.c
> index bb6a450f0ecc..be54ccea8c47 100644
> --- a/arch/riscv/kernel/traps.c
> +++ b/arch/riscv/kernel/traps.c
> @@ -213,11 +213,29 @@ asmlinkage unsigned long get_overflow_stack(void)
>                 OVERFLOW_STACK_SIZE;
>  }
>
> +/*
> + * A pseudo spinlock to protect the shadow stack from being used by multiple
> + * harts concurrently.  This isn't a real spinlock because the lock side must
> + * be taken without a valid stack and only a single register, it's only taken
> + * while in the process of panicing anyway so the performance and error
> + * checking a proper spinlock gives us doesn't matter.
> + */
> +unsigned long spin_shadow_stack;
> +
>  asmlinkage void handle_bad_stack(struct pt_regs *regs)
>  {
>         unsigned long tsk_stk = (unsigned long)current->stack;
>         unsigned long ovf_stk = (unsigned long)this_cpu_ptr(overflow_stack);
>
> +       /*
> +        * We're done with the shadow stack by this point, as we're on the
> +        * overflow stack.  Tell any other concurrent overflowing harts that
> +        * they can proceed with panicing by releasing the pseudo-spinlock.
> +        *
> +        * This pairs with an amoswap.aq in handle_kernel_stack_overflow.
> +        */
> +       smp_store_release(&spin_shadow_stack, 0);
> +
>         console_verbose();
>
>         pr_emerg("Insufficient stack space to handle exception!\n");
> --
> 2.38.1
>


--
Best Regards

 Guo Ren