linux-kernel - Re: [RFC] Circumventing FineIBT Via Entrypoints

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAG48ez0Bt9348i=We3-wJ1QrW-_5R-we7y_S3Q1brhoyEdHJ0Q@mail.gmail.com>
Date: Thu, 13 Feb 2025 20:23:34 +0100
From: Jann Horn <jannh@...gle.com>
To: Jennifer Miller <jmill@....edu>
Cc: Andy Lutomirski <luto@...nel.org>, linux-hardening@...r.kernel.org, kees@...nel.org, 
	joao@...rdrivepizza.com, samitolvanen@...gle.com, 
	kernel list <linux-kernel@...r.kernel.org>, Andrew Cooper <andrew.cooper3@...rix.com>
Subject: Re: [RFC] Circumventing FineIBT Via Entrypoints

On Thu, Feb 13, 2025 at 7:15 AM Jennifer Miller <jmill@....edu> wrote:
> On Wed, Feb 12, 2025 at 11:29:02PM +0100, Jann Horn wrote:
> > +Andy Lutomirski (X86 entry code maintainer)
> >
> > On Wed, Feb 12, 2025 at 10:08 PM Jennifer Miller <jmill@....edu> wrote:
> > > As part of a recently accepted paper we demonstrated that syscall
> > > entrypoints can be misused on x86-64 systems to generically bypass
> > > FineIBT/KERNEL_IBT from forwards-edge control flow hijacking. We
> > > communicated this finding to s@k.o before submitting the paper and were
> > > encouraged to bring the issue to hardening after the paper was accepted to
> > > have a discussion on how to address the issue.
> > >
> > > The bypass takes advantage of the architectural requirement of entrypoints
> > > to begin with the endbr64 instruction and the ability to control GS_BASE
> > > from userspace via wrgsbase, from to the FSGSBASE extension, in order to
> > > perform a stack pivot to a ROP-chain.
> >
> > Oh, fun, that's a gnarly quirk.
>
> yeag :)
>
> > Since the kernel, as far as I understand, uses FineIBT without
> > backwards control flow protection (in other words, I think we assume
> > that the kernel stack is trusted?), could we build a cheaper
> > check on that basis somehow? For example, maybe we could do something like:
> >
> > ```
> > endbr64
> > test rsp, rsp
> > js slowpath
> > swapgs
> > ```
> >
> > So we'd have the fast normal case where RSP points to userspace
> > (meaning we can't be coming from the kernel unless our stack has
> > already been pivoted, in which case forward edge protection alone
> > can't help anymore), and the slow case where RSP points to kernel
> > memory - in that case we'd then have to do some slower checks to
> > figure out whether weird userspace is making a syscall with RSP
> > pointing to the kernel, or whether we're coming from hijacked kernel
> > control flow.
>
> I've been tinkering this idea a bit and came with something.
>
> In short, we could have the slowpath branch as you suggested, in the
> slowpath permit the stack switch and preserving of the registers on the
> stack, but then do a sanity check according to the __per_cpu_offset array
> and decide from there whether we should continue executing the entrypoint
> or die/attempt to recover.

One ugly option to avoid the register spilling might be to say
"userspace is not allowed to execute a SYSCALL instruction while RSP
is a kernel address, and if userspace does it anyway, the kernel can
kill the process". Then the slowpath could immediately start using the
GPRs without having to worry about where to save their old values, and
it could read the correct gsbase with the GET_PERCPU_BASE macro. It
would be an ABI change, but one that is probably fairly unlikely to
actually break stuff? But it would require a bit of extra kernel code
on the slowpath, which is kinda annoying...

> Here is some napkin asm for this I wrote for the 64-bit syscall entrypoint,
> I think more or less the same could be done for the other entrypoints.
>
> ```
>     endbr64
>     test rsp, rsp
>     js slowpath
>
>     swapgs
>     ~~fastpath continues~~
>
> ; path taken when rsp was a kernel address
> ; we have no choice really but to switch to the stack from the untrusted
> ; gsbase but after doing so we have to be careful about what we put on the
> ; stack
> slowpath:
>     swapgs
>
> ; swap stacks as normal
>     mov    QWORD PTR gs:[rip+0x7f005f85],rsp       # 0x6014 <cpu_tss_rw+20>
>     mov    rsp,QWORD PTR gs:[rip+0x7f02c56d]       # 0x2c618 <pcpu_hot+24>
>
>     ~~normal push and clear GPRs sequence here~~
>
> ; we entered with an rsp in the kernel address range.
> ; we already did swapgs but we don't know if we can trust our gsbase yet.
> ; we should be able to trust the ro_after_init __per_cpu_offset array
> ; though.
>
> ; check that gsbase is the expected value for our current cpu
>     rdtscp
>     mov rax, QWORD PTR [8*ecx-0x7d7be460] <__per_cpu_offset>
>
>     rdgsbase rbx
>
>     cmp rbx, rax
>     je fastpath_after_regs_preserved
>
>     wrgsbase rax
>
> ; if we reach here we are being exploited and should explode or attempt
> ; to recover
> ```
>
> The unfortunate part is that it would still result in the register state
> being dumped on top of some attacker controlled address, so if the error
> path is recoverable someone could still use entrypoints to convert control
> flow hijacking into memory corruption via register dump. So it would kill
> the ability to get ROP but it would still be possible to dump regs over
> modprobe_path, core_pattern, etc.

It is annoying that we (as far as I know) don't have a nice clear
security model for what exactly CFI in the kernel is supposed to
achieve - though I guess that's partly because in its current version,
it only happens to protect against cases where an attacker gets a
function pointer overwrite, but not the probably more common cases
where the attacker (also?) gets an object pointer overwrite...

> Does this seem feasible and any better than the alternative of overwriting
> and restoring KERNEL_GS_BASE?

The syscall entry point is a hot path; my main reason for suggesting
the RSP check is that I'm worried about the performance impact of the
gsbase-overwriting approach, but I don't actually have numbers on
that. I figure a test + conditional jump is about the cheapest we can
do... Do we know how many cycles wrgsbase takes, and how serializing
is it? Sadly Agner Fog's tables don't seem to list it...

How would we actually do that overwriting and restoring of
KERNEL_GS_BASE? Would we need a scratch register for that?