linux-kernel - Re: [RFC] Circumventing FineIBT Via Entrypoints

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z65/Fpd9cnUk8TjE@ubun>
Date: Thu, 13 Feb 2025 16:24:06 -0700
From: Jennifer Miller <jmill@....edu>
To: Andrew Cooper <andrew.cooper3@...rix.com>, Jann Horn <jannh@...gle.com>
Cc: Andy Lutomirski <luto@...nel.org>, linux-hardening@...r.kernel.org,
	kees@...nel.org, joao@...rdrivepizza.com, samitolvanen@...gle.com,
	kernel list <linux-kernel@...r.kernel.org>
Subject: Re: [RFC] Circumventing FineIBT Via Entrypoints

On Thu, Feb 13, 2025 at 09:24:18PM +0000, Andrew Cooper wrote:
> On 13/02/2025 7:23 pm, Jann Horn wrote:
> > On Thu, Feb 13, 2025 at 7:15 AM Jennifer Miller <jmill@....edu> wrote:
> >> Here is some napkin asm for this I wrote for the 64-bit syscall entrypoint,
> >> I think more or less the same could be done for the other entrypoints.
> >>
> >> ```
> >>     endbr64
> >>     test rsp, rsp
> >>     js slowpath
> >>
> >>     swapgs
> >>     ~~fastpath continues~~
> >>
> >> ; path taken when rsp was a kernel address
> >> ; we have no choice really but to switch to the stack from the untrusted
> >> ; gsbase but after doing so we have to be careful about what we put on the
> >> ; stack
> >> slowpath:
> >>     swapgs
> 
> I'm afraid I don't follow.  By this point, both basic blocks are the
> same (a single swapgs).

Ah sure, the test/js could be moved occur after swapgs to save an 
instruction.

>
> Malicious userspace can get onto the slowpath by loading a kernel
> pointer into %rsp.  Furthermore, if the origin of this really was in the
> kernel, then ...
> 
> >>
> >> ; swap stacks as normal
> >>     mov    QWORD PTR gs:[rip+0x7f005f85],rsp       # 0x6014 <cpu_tss_rw+20>
> >>     mov    rsp,QWORD PTR gs:[rip+0x7f02c56d]       # 0x2c618 <pcpu_hot+24>
> 
> ... these are memory accesses using the user %gs.  As you note a few
> lines lower, %gs isn't safe at this point.
> 
> A cunning attacker can make gs:[rip+0x7f02c56d] be a read-only mapping,
> at point we'll have loaded an attacker controlled %rsp, then take #PF
> trying to spill %rsp into pcpu_hot, and now we're running the pagefault
> handler on an attacker controlled stack and gsbase.
> 

I don't follow, the spill of %rsp into pcpu_hot occurs first, before we
would move to the attacker controlled stack. This is Intel asm syntax,
sorry if that was unclear.

Still, I hadn't considered misusing readonly/unmapped pages on the GPR
register spill that follows. Could we enforce that the stack pointer we get
be page aligned to prevent this vector? So that if one were to attempt to
point the stack to readonly or unmapped memory they should be guaranteed to
double fault?

> >>     ~~normal push and clear GPRs sequence here~~
> >>
> >> ; we entered with an rsp in the kernel address range.
> >> ; we already did swapgs but we don't know if we can trust our gsbase yet.
> >> ; we should be able to trust the ro_after_init __per_cpu_offset array
> >> ; though.
> >>
> >> ; check that gsbase is the expected value for our current cpu
> >>     rdtscp
> >>     mov rax, QWORD PTR [8*ecx-0x7d7be460] <__per_cpu_offset>
> >>
> >>     rdgsbase rbx
> >>
> >>     cmp rbx, rax
> >>     je fastpath_after_regs_preserved
> >>
> >>     wrgsbase rax
> 
> Irrespective of other things, you'll need some compatibility strategy
> for the fact that RDTSCP and {RD,WR}{FS,GS}BASE cannot be used
> unconditionally in 64bit mode.  It might be as simple as making FineIBT
> depend on their presence to activate, but taking a #UD exception in this
> path is also a priv-esc vulnerability.

Sure, we could rdmsr IA32_TSC_AUX in place of rdtscsp. After the wrgsbase 
we could switch to the expected kernel stack now that gsbase is fixed 
before taking any #UD.

> 
> While all CET-IBT capable CPUs ought to have RDTSCP/*BASE, there are
> virt environments where this implication does not hold.
> 
> >>
> >> ; if we reach here we are being exploited and should explode or attempt
> >> ; to recover
> >> ```
> >>
> >> The unfortunate part is that it would still result in the register state
> >> being dumped on top of some attacker controlled address, so if the error
> >> path is recoverable someone could still use entrypoints to convert control
> >> flow hijacking into memory corruption via register dump. So it would kill
> >> the ability to get ROP but it would still be possible to dump regs over
> >> modprobe_path, core_pattern, etc.
> > It is annoying that we (as far as I know) don't have a nice clear
> > security model for what exactly CFI in the kernel is supposed to
> > achieve - though I guess that's partly because in its current version,
> > it only happens to protect against cases where an attacker gets a
> > function pointer overwrite, but not the probably more common cases
> > where the attacker (also?) gets an object pointer overwrite...
> >
> >> Does this seem feasible and any better than the alternative of overwriting
> >> and restoring KERNEL_GS_BASE?
> > The syscall entry point is a hot path; my main reason for suggesting
> > the RSP check is that I'm worried about the performance impact of the
> > gsbase-overwriting approach, but I don't actually have numbers on
> > that. I figure a test + conditional jump is about the cheapest we can
> > do...
> 
> Yeah, this is the cheapest I can think of too.  TEST+JS has been able to
> macrofuse since the Core2 era.
> 
> > Do we know how many cycles wrgsbase takes, and how serializing
> > is it? Sadly Agner Fog's tables don't seem to list it...
> 
> Not (architecturally) serialising, and pretty quick IIRC.  It is
> microcoded, but the segment registers are renamed so it can execute
> speculatively.
> 
> ~Andrew
> 
> >
> > How would we actually do that overwriting and restoring of
> > KERNEL_GS_BASE? Would we need a scratch register for that?
> 

I think we can do the overwrite at any point before actually calling into 
the individual syscall handlers, really anywhere before potentially 
hijacked indirect control flow can occur and then restore it just after 
those return e.g., for the 64-bit path I am currently overwriting it at the
start of do_syscall_64 and then restoring it just before 
syscall_exit_to_user_mode. I'm not sure if there is any reason to do it
sooner while we'd still be register constrained.

~Jennifer