[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <e3f98cf1-71ff-4425-9deb-31d2ae989eac@citrix.com>
Date: Thu, 27 Feb 2025 00:41:42 +0000
From: Andrew Cooper <andrew.cooper3@...rix.com>
To: Rudolf Marek <r.marek@...embler.cz>, Jann Horn <jannh@...gle.com>
Cc: jmill@....edu, joao@...rdrivepizza.com, luto@...nel.org,
samitolvanen@...gle.com, "Peter Zijlstra (Intel)" <peterz@...radead.org>,
linux-hardening@...r.kernel.org, lkml <linux-kernel@...r.kernel.org>,
x86 maintainers <x86@...nel.org>
Subject: Re: [RFC] Circumventing FineIBT Via Entrypoints
On 26/02/2025 10:48 pm, Rudolf Marek wrote:
> Hi Andrew,
>
> Dne 25. 02. 25 v 22:14 Andrew Cooper napsal(a):
>> As stand-in for "the reader", I'll point out that you need to add #DB to
>> that list or you're in for a rude surprise when running the x86
>> selftests.
>
> Thanks for pointing this out. I forgot about the interrupt shadow on
> SYSCALL
> and possibly some breakpoints possibilities in the kernel.
Isn't x86 lovely. This is yet another thing fixed in FRED; a CPL change
cancels pending_dbg.
>
>>> The SYSCALL/SYSENTER startup has interrupts disabled, so it is the
>>> problem of NMI/#MC
>>> handler which would need deal with the normal case and attack case.
>>
>> Right, but in the case of the attack, regular interrupts are most likely
>> enabled too. And writing this has just caused me to realise a
>> yet-more-fun case.
>> An interrupt hitting the syscall entry path (prior to SWAPGS) will cause
>> the interrupt handler's CPL check and conditional SWAPGS to do the wrong
>> thing and switch onto the user GS base too. (Prior research e.g.
>> GhostRace has shown how to get an hrtimer to reliably hit an instruction
>> boundary.)
>
> I don't see it, because if attacker starts at syscall entry and
> interrupts are enabled and the interrupt happens right there the
> handler will just see proper IRET frame with %cs of kernel and will
> not perform swapgs. I will try to think about it again tomorrow I
> likely missed something.
Nope, you're correct. I meant (after the SWAPGS).
The linear sequence of actions is:
* Follow bad fnptr to the SYSCALL entry
* SWAPGS (now on user gs)
* Interrupt. Handler sees %cs == kernel, so doesn't SWAPGS again
* Interrupt handler runs fully on user gs.
>
>> Interrupts and exceptions look at %cs in the IRET frame to judge whether
>> to SWAPGS or not (and this is one of the main things that paranoid_entry
>> does differently). In the case of the attack, there's no IRET frame
>> pushed on the stack and the read of %cs is out-of-bounds, most likely
>> the stack frame of the function which followed the corrupt function
>> pointer.
>
> Thank you for your detailed explanation.
>
>> The SYSCALL entrypoint is simply the easiest to pivot on, but all can be
>> attacked in this manner. Fixing only the SYSCALL entrypoint doesn't
>> improve things much.
>
> Maybe more elegant and cheap check on IDT entry "authenticity" would
> be to check for current %ss which needs to be NULL and possibly check
> the %CS on stack frame
> by checking kernel %cs and not just two CPL bits and/or perform more
> checks.
>
> Another ideas if you think it is still worth to discuss this topic:
>
> What about to use completely different %CS selector for all entry
> code? The early entry code would check the %cs selector and panic if
> it is wrong one.
>
> After swapgs dance, we need to perform far jump to normal kernel %CS,
> which might cost something.
>
> To fix the interrupt on fake entry problem, we could check in relevant
> IDT handlers that we never come from "completely different" %CS used
> above for the early entry code.
Ooh, this looks promising.
For IDT it's quite easy. Have a separate DPL0 %cs in the GDT, and write
it into the IDT.
For SYSCALL/SYSENTER it's a little more complicated. I think you want
to move the selectors so they don't alias __KERN_CS directly, so you can
then move back to __KERN_CS in a similar way.
Give or take paranoid_entry for the IST vectors, any entrypoint that
finds itself on __KERN_CS did not get there through the CPU loading a
new context.
It would depend on an attacker not being able to include a FAR CALL into
their exploit chain, or be able toe write the IDT. I don't know how
reasonable that would be if we're ruling out all architectural paths not
beginning with an ENDBR, but FAR CALLs are rare in general owing to them
being dog slow in general, and an attacker who can write the IDT doesn't
need these kinds of games to pivot.
We do need at least one scratch register to check %cs. For IDT and
SYSENTER entries, we can reasonably well spill to the stack (again, an
attacker that can modify the stack has won without playing these games),
and for SYSCALL, we can use the low part of %r11 as you already
demonstrated.
Anyone fancy doing a prototype of this?
>
> And very last idea would be to somehow persuade the Last Branch
> Recording to record exception entries only and just check it from MSR.
> But maybe it is too costly and/or not possible.
This doesn't cover all cases, I don't think. It also won't work under
virt, where LBR isn't reliably available. Also LBR is reasonably full
of errata, and quite slow.
Also VMX clears it unilaterally on vmexit, and at least we don't have an
ENDBR in that path to worry about.
~Andrew
Powered by blists - more mailing lists