[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z60NwR4w/28Z7XUa@ubun>
Date: Wed, 12 Feb 2025 14:08:17 -0700
From: Jennifer Miller <jmill@....edu>
To: linux-hardening@...r.kernel.org
Cc: kees@...nel.org, joao@...rdrivepizza.com, samitolvanen@...gle.com
Subject: [RFC] Circumventing FineIBT Via Entrypoints
Hi All,
As part of a recently accepted paper we demonstrated that syscall
entrypoints can be misused on x86-64 systems to generically bypass
FineIBT/KERNEL_IBT from forwards-edge control flow hijacking. We
communicated this finding to s@k.o before submitting the paper and were
encouraged to bring the issue to hardening after the paper was accepted to
have a discussion on how to address the issue.
The bypass takes advantage of the architectural requirement of entrypoints
to begin with the endbr64 instruction and the ability to control GS_BASE
from userspace via wrgsbase, from to the FSGSBASE extension, in order to
perform a stack pivot to a ROP-chain.
Here is a snippet of the 64-bit entrypoint code:
```
entry_SYSCALL_64:
<+0>: endbr64
<+4>: swapgs
<+7>: mov QWORD PTR gs:0x6014,rsp
<+16>: jmp <entry_SYSCALL_64+36>
<+18>: mov rsp,cr3
<+21>: nop
<+26>: and rsp,0xffffffffffffe7ff
<+33>: mov cr3,rsp
<+36>: mov rsp,QWORD PTR gs:0x32c98
```
This is a valid target from any indirect callsite under FineIBT due to the
endbr64 instruction and the lack of a software CFI check. After hijacking
control flow to the entrypoint, executing swapgs will swap to the user
controlled GS_BASE, which will be used to set the stack pointer, leading to
a stack pivot. The rest of the entrypoint will execute with a hijacked
GS_BASE on a user controlled stack. The stack page we use is one mapped in
the user address space and from another thread we race overwriting returns
addresses on the stack to pivot a second time to a ROP-chain. For this to
succeed we required a large area of user-controlled kernel memory that can
serve as the forged GS_BASE address, we did this by spraying 2MB
Transparent Huge Pages to fill the kernel physical memory map with
controlled 2MB allocations and guessing relative to the base address of the
area to hit a page we control.
We evaluated an approach to patching the issue in the paper but it touched
the userspace API a bit, added an error code returned by syscalls if they
are invoked with a kernel address in GS_BASE, which is not a great
solution.
Linus provided some thoughts on how to potentially address this issue
in our communication with s@k.o, suggesting the kernel could make the
KERNEL_GS_BASE match the GS_BASE value so both registers always contain a
valid kernel address and a confusion induced by executing swapgs an extra
time cannot occur, and restore the value of KERNEL_GS_BASE ahead of
executing swapgs in the exit path.
I started working on a patch based on the approach suggested by Linus but I
haven't been able to get it passing the relevant x86 selftests yet. It
turned out that it's more than the entrypoint code that needs to be
modified for it to work, we need to correctly save and restore the user's
GS_BASE across task switches and ensure it is updated correctly when set
via arch_prctl and ptrace. Unfortunately, I lack familiarity with those
parts of the kernel, and my understanding is that the paper will be made
public in a couple weeks so I didn't want to delay too long on bringing the
issue to this list.
Assuming this is an issue you all feel is worth addressing, I will continue
working on providing a patch. I'm concerned though that the overhead from
adding a wrmsr on both syscall entry and exit to overwrite and restore the
KERNEL_GS_BASE MSR may be quite high, so any feedback in regards to the
approach or suggestions of alternate approaches to patching are welcome :)
~Jennifer
Powered by blists - more mailing lists