[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAG48ez09JuZPt112nnE6N=hS6cfCLkT-iHUAmidQ-QGNGMVoBw@mail.gmail.com>
Date: Wed, 12 Feb 2025 23:29:02 +0100
From: Jann Horn <jannh@...gle.com>
To: Jennifer Miller <jmill@....edu>, Andy Lutomirski <luto@...nel.org>
Cc: linux-hardening@...r.kernel.org, kees@...nel.org, joao@...rdrivepizza.com,
samitolvanen@...gle.com, kernel list <linux-kernel@...r.kernel.org>
Subject: Re: [RFC] Circumventing FineIBT Via Entrypoints
+Andy Lutomirski (X86 entry code maintainer)
On Wed, Feb 12, 2025 at 10:08 PM Jennifer Miller <jmill@....edu> wrote:
> As part of a recently accepted paper we demonstrated that syscall
> entrypoints can be misused on x86-64 systems to generically bypass
> FineIBT/KERNEL_IBT from forwards-edge control flow hijacking. We
> communicated this finding to s@k.o before submitting the paper and were
> encouraged to bring the issue to hardening after the paper was accepted to
> have a discussion on how to address the issue.
>
> The bypass takes advantage of the architectural requirement of entrypoints
> to begin with the endbr64 instruction and the ability to control GS_BASE
> from userspace via wrgsbase, from to the FSGSBASE extension, in order to
> perform a stack pivot to a ROP-chain.
Oh, fun, that's a gnarly quirk.
> Here is a snippet of the 64-bit entrypoint code:
> ```
> entry_SYSCALL_64:
> <+0>: endbr64
> <+4>: swapgs
> <+7>: mov QWORD PTR gs:0x6014,rsp
> <+16>: jmp <entry_SYSCALL_64+36>
> <+18>: mov rsp,cr3
> <+21>: nop
> <+26>: and rsp,0xffffffffffffe7ff
> <+33>: mov cr3,rsp
> <+36>: mov rsp,QWORD PTR gs:0x32c98
> ```
>
> This is a valid target from any indirect callsite under FineIBT due to the
> endbr64 instruction and the lack of a software CFI check. After hijacking
> control flow to the entrypoint, executing swapgs will swap to the user
> controlled GS_BASE, which will be used to set the stack pointer, leading to
> a stack pivot. The rest of the entrypoint will execute with a hijacked
> GS_BASE on a user controlled stack. The stack page we use is one mapped in
> the user address space and from another thread we race overwriting returns
> addresses on the stack to pivot a second time to a ROP-chain. For this to
> succeed we required a large area of user-controlled kernel memory that can
> serve as the forged GS_BASE address, we did this by spraying 2MB
> Transparent Huge Pages to fill the kernel physical memory map with
> controlled 2MB allocations and guessing relative to the base address of the
> area to hit a page we control.
>
> We evaluated an approach to patching the issue in the paper but it touched
> the userspace API a bit, added an error code returned by syscalls if they
> are invoked with a kernel address in GS_BASE, which is not a great
> solution.
>
> Linus provided some thoughts on how to potentially address this issue
> in our communication with s@k.o, suggesting the kernel could make the
> KERNEL_GS_BASE match the GS_BASE value so both registers always contain a
> valid kernel address and a confusion induced by executing swapgs an extra
> time cannot occur, and restore the value of KERNEL_GS_BASE ahead of
> executing swapgs in the exit path.
>
> I started working on a patch based on the approach suggested by Linus but I
> haven't been able to get it passing the relevant x86 selftests yet. It
> turned out that it's more than the entrypoint code that needs to be
> modified for it to work, we need to correctly save and restore the user's
> GS_BASE across task switches and ensure it is updated correctly when set
> via arch_prctl and ptrace. Unfortunately, I lack familiarity with those
> parts of the kernel, and my understanding is that the paper will be made
> public in a couple weeks so I didn't want to delay too long on bringing the
> issue to this list.
>
> Assuming this is an issue you all feel is worth addressing, I will continue
> working on providing a patch. I'm concerned though that the overhead from
> adding a wrmsr on both syscall entry and exit to overwrite and restore the
> KERNEL_GS_BASE MSR may be quite high, so any feedback in regards to the
> approach or suggestions of alternate approaches to patching are welcome :)
Since the kernel, as far as I understand, uses FineIBT without
backwards control flow protection (in other words, I think we assume
that the kernel stack is trusted?), could we build a cheaper
check on that basis somehow? For example, maybe we could do something like:
```
endbr64
test rsp, rsp
js slowpath
swapgs
```
So we'd have the fast normal case where RSP points to userspace
(meaning we can't be coming from the kernel unless our stack has
already been pivoted, in which case forward edge protection alone
can't help anymore), and the slow case where RSP points to kernel
memory - in that case we'd then have to do some slower checks to
figure out whether weird userspace is making a syscall with RSP
pointing to the kernel, or whether we're coming from hijacked kernel
control flow.
Powered by blists - more mailing lists