[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <60447cd2-a8da-4be6-80fa-a5639b7455b1@citrix.com>
Date: Thu, 13 Feb 2025 21:24:18 +0000
From: Andrew Cooper <andrew.cooper3@...rix.com>
To: Jann Horn <jannh@...gle.com>, Jennifer Miller <jmill@....edu>
Cc: Andy Lutomirski <luto@...nel.org>, linux-hardening@...r.kernel.org,
kees@...nel.org, joao@...rdrivepizza.com, samitolvanen@...gle.com,
kernel list <linux-kernel@...r.kernel.org>
Subject: Re: [RFC] Circumventing FineIBT Via Entrypoints
On 13/02/2025 7:23 pm, Jann Horn wrote:
> On Thu, Feb 13, 2025 at 7:15 AM Jennifer Miller <jmill@....edu> wrote:
>> Here is some napkin asm for this I wrote for the 64-bit syscall entrypoint,
>> I think more or less the same could be done for the other entrypoints.
>>
>> ```
>> endbr64
>> test rsp, rsp
>> js slowpath
>>
>> swapgs
>> ~~fastpath continues~~
>>
>> ; path taken when rsp was a kernel address
>> ; we have no choice really but to switch to the stack from the untrusted
>> ; gsbase but after doing so we have to be careful about what we put on the
>> ; stack
>> slowpath:
>> swapgs
I'm afraid I don't follow. By this point, both basic blocks are the
same (a single swapgs).
Malicious userspace can get onto the slowpath by loading a kernel
pointer into %rsp. Furthermore, if the origin of this really was in the
kernel, then ...
>>
>> ; swap stacks as normal
>> mov QWORD PTR gs:[rip+0x7f005f85],rsp # 0x6014 <cpu_tss_rw+20>
>> mov rsp,QWORD PTR gs:[rip+0x7f02c56d] # 0x2c618 <pcpu_hot+24>
... these are memory accesses using the user %gs. As you note a few
lines lower, %gs isn't safe at this point.
A cunning attacker can make gs:[rip+0x7f02c56d] be a read-only mapping,
at point we'll have loaded an attacker controlled %rsp, then take #PF
trying to spill %rsp into pcpu_hot, and now we're running the pagefault
handler on an attacker controlled stack and gsbase.
>> ~~normal push and clear GPRs sequence here~~
>>
>> ; we entered with an rsp in the kernel address range.
>> ; we already did swapgs but we don't know if we can trust our gsbase yet.
>> ; we should be able to trust the ro_after_init __per_cpu_offset array
>> ; though.
>>
>> ; check that gsbase is the expected value for our current cpu
>> rdtscp
>> mov rax, QWORD PTR [8*ecx-0x7d7be460] <__per_cpu_offset>
>>
>> rdgsbase rbx
>>
>> cmp rbx, rax
>> je fastpath_after_regs_preserved
>>
>> wrgsbase rax
Irrespective of other things, you'll need some compatibility strategy
for the fact that RDTSCP and {RD,WR}{FS,GS}BASE cannot be used
unconditionally in 64bit mode. It might be as simple as making FineIBT
depend on their presence to activate, but taking a #UD exception in this
path is also a priv-esc vulnerability.
While all CET-IBT capable CPUs ought to have RDTSCP/*BASE, there are
virt environments where this implication does not hold.
>>
>> ; if we reach here we are being exploited and should explode or attempt
>> ; to recover
>> ```
>>
>> The unfortunate part is that it would still result in the register state
>> being dumped on top of some attacker controlled address, so if the error
>> path is recoverable someone could still use entrypoints to convert control
>> flow hijacking into memory corruption via register dump. So it would kill
>> the ability to get ROP but it would still be possible to dump regs over
>> modprobe_path, core_pattern, etc.
> It is annoying that we (as far as I know) don't have a nice clear
> security model for what exactly CFI in the kernel is supposed to
> achieve - though I guess that's partly because in its current version,
> it only happens to protect against cases where an attacker gets a
> function pointer overwrite, but not the probably more common cases
> where the attacker (also?) gets an object pointer overwrite...
>
>> Does this seem feasible and any better than the alternative of overwriting
>> and restoring KERNEL_GS_BASE?
> The syscall entry point is a hot path; my main reason for suggesting
> the RSP check is that I'm worried about the performance impact of the
> gsbase-overwriting approach, but I don't actually have numbers on
> that. I figure a test + conditional jump is about the cheapest we can
> do...
Yeah, this is the cheapest I can think of too. TEST+JS has been able to
macrofuse since the Core2 era.
> Do we know how many cycles wrgsbase takes, and how serializing
> is it? Sadly Agner Fog's tables don't seem to list it...
Not (architecturally) serialising, and pretty quick IIRC. It is
microcoded, but the segment registers are renamed so it can execute
speculatively.
~Andrew
>
> How would we actually do that overwriting and restoring of
> KERNEL_GS_BASE? Would we need a scratch register for that?
Powered by blists - more mailing lists