linux-kernel - Re: [RFC] Circumventing FineIBT Via Entrypoints

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <60447cd2-a8da-4be6-80fa-a5639b7455b1@citrix.com>
Date: Thu, 13 Feb 2025 21:24:18 +0000
From: Andrew Cooper <andrew.cooper3@...rix.com>
To: Jann Horn <jannh@...gle.com>, Jennifer Miller <jmill@....edu>
Cc: Andy Lutomirski <luto@...nel.org>, linux-hardening@...r.kernel.org,
 kees@...nel.org, joao@...rdrivepizza.com, samitolvanen@...gle.com,
 kernel list <linux-kernel@...r.kernel.org>
Subject: Re: [RFC] Circumventing FineIBT Via Entrypoints

On 13/02/2025 7:23 pm, Jann Horn wrote:
> On Thu, Feb 13, 2025 at 7:15 AM Jennifer Miller <jmill@....edu> wrote:
>> Here is some napkin asm for this I wrote for the 64-bit syscall entrypoint,
>> I think more or less the same could be done for the other entrypoints.
>>
>> ```
>>     endbr64
>>     test rsp, rsp
>>     js slowpath
>>
>>     swapgs
>>     ~~fastpath continues~~
>>
>> ; path taken when rsp was a kernel address
>> ; we have no choice really but to switch to the stack from the untrusted
>> ; gsbase but after doing so we have to be careful about what we put on the
>> ; stack
>> slowpath:
>>     swapgs

I'm afraid I don't follow.  By this point, both basic blocks are the
same (a single swapgs).

Malicious userspace can get onto the slowpath by loading a kernel
pointer into %rsp.  Furthermore, if the origin of this really was in the
kernel, then ...

>>
>> ; swap stacks as normal
>>     mov    QWORD PTR gs:[rip+0x7f005f85],rsp       # 0x6014 <cpu_tss_rw+20>
>>     mov    rsp,QWORD PTR gs:[rip+0x7f02c56d]       # 0x2c618 <pcpu_hot+24>

... these are memory accesses using the user %gs.  As you note a few
lines lower, %gs isn't safe at this point.

A cunning attacker can make gs:[rip+0x7f02c56d] be a read-only mapping,
at point we'll have loaded an attacker controlled %rsp, then take #PF
trying to spill %rsp into pcpu_hot, and now we're running the pagefault
handler on an attacker controlled stack and gsbase.

>>     ~~normal push and clear GPRs sequence here~~
>>
>> ; we entered with an rsp in the kernel address range.
>> ; we already did swapgs but we don't know if we can trust our gsbase yet.
>> ; we should be able to trust the ro_after_init __per_cpu_offset array
>> ; though.
>>
>> ; check that gsbase is the expected value for our current cpu
>>     rdtscp
>>     mov rax, QWORD PTR [8*ecx-0x7d7be460] <__per_cpu_offset>
>>
>>     rdgsbase rbx
>>
>>     cmp rbx, rax
>>     je fastpath_after_regs_preserved
>>
>>     wrgsbase rax

Irrespective of other things, you'll need some compatibility strategy
for the fact that RDTSCP and {RD,WR}{FS,GS}BASE cannot be used
unconditionally in 64bit mode.  It might be as simple as making FineIBT
depend on their presence to activate, but taking a #UD exception in this
path is also a priv-esc vulnerability.

While all CET-IBT capable CPUs ought to have RDTSCP/*BASE, there are
virt environments where this implication does not hold.

>>
>> ; if we reach here we are being exploited and should explode or attempt
>> ; to recover
>> ```
>>
>> The unfortunate part is that it would still result in the register state
>> being dumped on top of some attacker controlled address, so if the error
>> path is recoverable someone could still use entrypoints to convert control
>> flow hijacking into memory corruption via register dump. So it would kill
>> the ability to get ROP but it would still be possible to dump regs over
>> modprobe_path, core_pattern, etc.
> It is annoying that we (as far as I know) don't have a nice clear
> security model for what exactly CFI in the kernel is supposed to
> achieve - though I guess that's partly because in its current version,
> it only happens to protect against cases where an attacker gets a
> function pointer overwrite, but not the probably more common cases
> where the attacker (also?) gets an object pointer overwrite...
>
>> Does this seem feasible and any better than the alternative of overwriting
>> and restoring KERNEL_GS_BASE?
> The syscall entry point is a hot path; my main reason for suggesting
> the RSP check is that I'm worried about the performance impact of the
> gsbase-overwriting approach, but I don't actually have numbers on
> that. I figure a test + conditional jump is about the cheapest we can
> do...

Yeah, this is the cheapest I can think of too.  TEST+JS has been able to
macrofuse since the Core2 era.

> Do we know how many cycles wrgsbase takes, and how serializing
> is it? Sadly Agner Fog's tables don't seem to list it...

Not (architecturally) serialising, and pretty quick IIRC.  It is
microcoded, but the segment registers are renamed so it can execute
speculatively.

~Andrew

>
> How would we actually do that overwriting and restoring of
> KERNEL_GS_BASE? Would we need a scratch register for that?