[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.21.1904050007050.1802@nanos.tec.linutronix.de>
Date: Fri, 5 Apr 2019 10:35:48 +0200 (CEST)
From: Thomas Gleixner <tglx@...utronix.de>
To: "Chang S. Bae" <chang.seok.bae@...el.com>
cc: Ingo Molnar <mingo@...nel.org>, Andy Lutomirski <luto@...nel.org>,
"H . Peter Anvin" <hpa@...or.com>, Andi Kleen <ak@...ux.intel.com>,
Ravi Shankar <ravi.v.shankar@...el.com>,
LKML <linux-kernel@...r.kernel.org>,
Dave Hansen <dave.hansen@...ux.intel.com>
Subject: Re: [RESEND PATCH v6 08/12] x86/fsgsbase/64: Use the per-CPU base
as GSBASE at the paranoid_entry
On Mon, 25 Mar 2019, Thomas Gleixner wrote:
> On Fri, 15 Mar 2019, Chang S. Bae wrote:
> > ENTRY(paranoid_exit)
> > UNWIND_HINT_REGS
> > DISABLE_INTERRUPTS(CLBR_ANY)
> > TRACE_IRQS_OFF_DEBUG
> > + ALTERNATIVE "jmp .Lparanoid_exit_no_fsgsbase", "nop",\
> > + X86_FEATURE_FSGSBASE
> > + wrgsbase %rbx
> > + jmp .Lparanoid_exit_no_swapgs;
>
> Again. A few newlines would make it more readable.
>
> This modifies the semantics of paranoid_entry and paranoid_exit. Looking at
> the usage sites there is the following code in the nmi maze:
>
> /*
> * Use paranoid_entry to handle SWAPGS, but no need to use paranoid_exit
> * as we should not be calling schedule in NMI context.
> * Even with normal interrupts enabled. An NMI should not be
> * setting NEED_RESCHED or anything that normal interrupts and
> * exceptions might do.
> */
> call paranoid_entry
> UNWIND_HINT_REGS
>
> /* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */
> movq %rsp, %rdi
> movq $-1, %rsi
> call do_nmi
>
> /* Always restore stashed CR3 value (see paranoid_entry) */
> RESTORE_CR3 scratch_reg=%r15 save_reg=%r14
>
> testl %ebx, %ebx /* swapgs needed? */
> jnz nmi_restore
> nmi_swapgs:
> SWAPGS_UNSAFE_STACK
> nmi_restore:
> POP_REGS
>
> I might be missing something, but how is that supposed to work when
> paranoid_entry uses FSGSBASE? I think it's broken, but if it's not then
> there is a big fat comment missing explaining why.
So this _is_ broken.
On entry:
rbx = rdgsbase()
wrgsbase(KERNEL_GS)
On exit:
if (ebx == 0)
swapgs
The resulting matrix:
| ENTRY GS | RBX | EXIT | GS on IRET | RESULT
| | | | |
1 | KERNEL_GS | KERNEL_GS | EBX == 0 | USER_GS | FAIL
| | | | |
2 | KERNEL_GS | KERNEL_GS | EBX != 0 | KERNEL_GS | ok
| | | | |
3 | USER_GS | USER_GS | EBX == 0 | USER_GS | ok
| | | | |
4 | USER_GS | USER_GS | EBX != 0 | KERNEL_GS | FAIL
#1 Just works by chance because it's unlikely that the lower 32bits of a
per CPU kernel GS are all 0.
But it's just a question of probability that this turns into a
non-debuggable once per year crash (think KASLR).
#4 This can happen when the NMI hits the kernel in some other entry code
_BEFORE_ or _AFTER_ swapgs.
User space using GS addressing with GS[31:0] != 0 will crash and burn.
IIRC FSGSBASE is about fast user space GS switching with (almost) no
limits on the value ...
Oh well.
tglx
Powered by blists - more mailing lists