[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87wmkkpf5v.ffs@tglx>
Date: Tue, 13 Aug 2024 18:21:16 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: Daniel Thompson <daniel.thompson@...aro.org>
Cc: Florian Rommel <mail@...rommel.de>, Ingo Molnar <mingo@...hat.com>,
Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>,
"H . Peter Anvin" <hpa@...or.com>, Jason Wessel
<jason.wessel@...driver.com>, Douglas Anderson <dianders@...omium.org>,
Lorena Kretzschmar <qy15sije@....cs.fau.de>, Stefan Saecherl
<stefan.saecherl@....de>, Peter Zijlstra <peterz@...radead.org>,
Christophe JAILLET <christophe.jaillet@...adoo.fr>, Randy Dunlap
<rdunlap@...radead.org>, Masami Hiramatsu <mhiramat@...nel.org>, Andrew
Morton <akpm@...ux-foundation.org>, Christophe Leroy
<christophe.leroy@...roup.eu>, Geert Uytterhoeven
<geert+renesas@...der.be>, kgdb-bugreport@...ts.sourceforge.net,
x86@...nel.org, linux-kernel@...r.kernel.org, Kees Cook <kees@...nel.org>
Subject: Re: [PATCH v2 2/2] x86/kgdb: fix hang on failed breakpoint removal
Daniel!
On Tue, Aug 13 2024 at 12:31, Daniel Thompson wrote:
> On Mon, Aug 12, 2024 at 11:04:13PM +0200, Thomas Gleixner wrote:
>> Btw, kgdb_skipexception() is a gross nisnomer because it rewinds the
>> instruction pointer to the exception address and does not skip anything,
>> but that's an orthogonal issue though it could be cleaned up along the
>> way...
>
> kgdb_skipexception() is not a directive from debug core to architecture.
> It is a question to the archictecture: should the debug core skip normal
> debug exception handling and resume immediately?.
Ah. This code is so exceptionally intuitive ....
> It allows an architecture to direct the debug core to ignore a spurious
> trap that, according to the comments, can occur on some
> (micro)architectures when we have already restored the original
> not-a-breakpoint instruction.
Potentially due to the lack of sync_core() after the copy.
And of course the removal can fail as Florian discussed.
> Florian's patch changes things so that we will also skip debug exception
> handling if we can successfully poke to the text section. I don't think
> it is sufficient on its own since the text_mutex could be owned by the
> core that is stuck trapping on the int3 that we can't remove.
I was asking exactly that:
> What guarantees the release of text mutex?
:)
>> Aside of that the same problem exists on PowerPC. So you can move the
>> attempt to remove the breakpoint into the generic code, no?
>
> Getting the debug core to track breakpoints that are stuck on would be a
> good improvement.
>
> We would be able to use that logic to retry the removal of stuck
> breakpoint once other SMP cores are running again. That would be great
> for architectures like arm64 that use spin-with-irqsave locking in their
> noinstr text_poke() machinery.
>
> However it won't solve the problem for x86 since it uses mutex locking.
>
> A partial solution might be to get kgdb_arch_remove_breakpoint() to
> disregard the text_mutex completely if kdb_continue_catastrophic is set
> (and adding stuck breakpoints to the reasons to inhibit a continue).
The interrupted owner of text_mutex might be in the middle of modifying
the poking_mm page tables or in the worst case modifying the code which
kgdb wants to play with.
Dragons are guaranteed.
> This is a partial solution in the sense that it is not safe: we will
> simply tell the kernel dev driving the debugger that they are
> responsible for the safety of the continue and then disable the safety
> rails.
>
> I haven't yet come up with a full fix that doesn't require
> text_poke_kgdb() to not require text_mutex to be free. I did note that
> comment in __text_poke() that says calling get_locked_pte() "is not
> really needed, but this allows to avoid open-coding" but I got a bit lost
> trying to figure out other locks and nesting concerns.
So there are no other locks involved. The PTE lock is not strictly
required because the access to poking_mm is fully serialized.
You'd need to add a separate kgdb_poking_mm and kgdb_poking_addr. Now
make __text_poke() take a @mm and @pokeaddr argument and let it operate
on them. But that solves only part of the problem:
1) A concurrent modification of the same code will result in
undefined behaviour. Not sure whether that's actually an issue,
but I would not bet on it.
2) switch_mm() and the related code are not reentrancy safe either
3) TLB flushing. That can't use tlb_flush_mm_range() simply because
that's not reentrant.
Which makes me wonder about #2. As this stuff can run in NMI context,
then even if text_mutex is uncontended, then tlb_flush_mm_range() might
be what had been interrupted, so reentrancy would be fatal.
That's all a horrorshow as you can't play with CR0.WP either at least
not when CR4.CET is set.
So if you can force disable CET when KGDB is enabled, then you might get
away with:
bp->code = *p;
clear_cr0_wp();
*p = int3;
set_cr0_wp();
Though the hardening people might not be really happy about this.
But let's take a step back. Installing a breakpoint is done by the
debugger. The debugger knows the original code, right?
So why cant the debugger provide a trampoline:
ORIG_OP // with fixed up displacements
JMP NEXT_INSN
Now you stick that into a trampoline page slot and then patch the INT3
in. Now on removal can be a two stage approach:
bp->state = INACTIVE;
kick_breakpoint_gc();
breakpoint_gc() removes the breakpoint and invalidates the trampoline
from a safe context and up to that point kgdb_skipexception() can do:
bool kgdb_skip_exception(int exception, struct pt_regs *regs)
{
struct kgdb_bkpt *bp;
if (exception != 3)
return false;
bp = kgdb_get_inactive_breakpoint(--regs->ip, BP_BREAKPOINT);
if (bp)
regs->ip = bp->trampoline;
return true;
}
Hmm?
Thanks,
tglx
Powered by blists - more mailing lists