[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAG48ez0h0wUS6y+W1HTOwN14V95gKmmFZ_2TamAX+JKTmXT=DA@mail.gmail.com>
Date: Thu, 13 Feb 2025 03:09:20 +0100
From: Jann Horn <jannh@...gle.com>
To: Andrew Cooper <andrew.cooper3@...rix.com>
Cc: jmill@....edu, joao@...rdrivepizza.com, kees@...nel.org,
linux-hardening@...r.kernel.org, linux-kernel@...r.kernel.org,
luto@...nel.org, samitolvanen@...gle.com,
"Peter Zijlstra (Intel)" <peterz@...radead.org>
Subject: Re: [RFC] Circumventing FineIBT Via Entrypoints
On Thu, Feb 13, 2025 at 2:31 AM Andrew Cooper <andrew.cooper3@...rix.com> wrote:
> >> Assuming this is an issue you all feel is worth addressing, I will
> >> continue working on providing a patch. I'm concerned though that the
> >> overhead from adding a wrmsr on both syscall entry and exit to
> >> overwrite and restore the KERNEL_GS_BASE MSR may be quite high, so
> >> any feedback in regards to the approach or suggestions of alternate
> >> approaches to patching are welcome :)
> >
> > Since the kernel, as far as I understand, uses FineIBT without
> > backwards control flow protection (in other words, I think we assume
> > that the kernel stack is trusted?),
>
> This is fun indeed. Linux cannot use supervisor shadow stacks because
> the mess around NMI re-entrancy (and IST more generally) requires ROP
> gadgets in order to function safely. Implementing this with shadow
> stacks active, while not impossible, is deemed to be prohibitively
> complicated.
>
> Linux's supervisor shadow stack support is waiting for FRED support,
> which fixes both the NMI re-entrancy problem, and other exceptions
> nesting within NMIs, as well as prohibiting the use of the SWAPGS
> instruction as FRED tries to make sure that the correct GS is always in
> context.
>
> But, FRED support is slated for PantherLake/DiamondRapids which haven't
> shipped yet, so are no use to the problem right now.
>
> > could we build a cheaper
> > check on that basis somehow? For example, maybe we could do something like:
> >
> > ```
> > endbr64
> > test rsp, rsp
> > js slowpath
> > swapgs
> > ```
>
> I presume it's been pointed out already, but there are 3 related
> entrypoints here. SYSCALL64 which is discussed, SYSCALL32 and SYSENTER
> which are related.
>
> But, any other IDT entry is in a similar bucket. If we're corrupting a
> function pointer or return address to redirect here, then the check of
> CS(%rsp) to control the conditional SWAPGS is an OoB read in the callers
> stack frame.
>
> For IDT entries, checking %rsp is reasonable, because userspace can't
> forge a kernel-like %rsp. However, SYSCALL64 specifically leaves %rsp
> entirely attacker controlled (and even potentially non-canonical), so
> I'm wondering what you hand in mind for the slowpath to truly
> distinguish kernel context from user context?
Hm, yeah, that seems hard - maybe the best we could do is to make sure
that the inactive gsbase has the correct value for our CPU's kernel
gsbase? Kinda like a paranoid_entry, except more painful because we'd
first have to figure out a place to spill registers to before we can
start using stuff like rdmsr... Then a function pointer overwrite
might still turn into returning to userspace with a sysret with GPRs
full of kernel pointers, but at least we wouldn't run off of a bogus
gsbase anymore?
Powered by blists - more mailing lists