[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <871q2tsbaq.ffs@tglx>
Date: Mon, 12 Aug 2024 23:04:13 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: Florian Rommel <mail@...rommel.de>, Ingo Molnar <mingo@...hat.com>,
Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>,
"H . Peter Anvin" <hpa@...or.com>, Jason Wessel
<jason.wessel@...driver.com>, Daniel Thompson
<daniel.thompson@...aro.org>, Douglas Anderson <dianders@...omium.org>,
Lorena Kretzschmar <qy15sije@....cs.fau.de>, Stefan Saecherl
<stefan.saecherl@....de>, Peter Zijlstra <peterz@...radead.org>,
Christophe JAILLET <christophe.jaillet@...adoo.fr>, Randy Dunlap
<rdunlap@...radead.org>, Masami Hiramatsu <mhiramat@...nel.org>, Andrew
Morton <akpm@...ux-foundation.org>, Christophe Leroy
<christophe.leroy@...roup.eu>, Geert Uytterhoeven
<geert+renesas@...der.be>, kgdb-bugreport@...ts.sourceforge.net,
x86@...nel.org, linux-kernel@...r.kernel.org
Cc: Florian Rommel <mail@...rommel.de>
Subject: Re: [PATCH v2 2/2] x86/kgdb: fix hang on failed breakpoint removal
Florian!
On Mon, Aug 12 2024 at 19:43, Florian Rommel wrote:
> On x86, occasionally, the removal of a breakpoint (i.e., removal of
> the int3 instruction) fails because the text_mutex is taken by another
> CPU (mainly due to the static_key mechanism, I think).
Either you know it or not. Speculation is reserved for CPUs.
> The function kgdb_skipexception catches exceptions from these spurious
> int3 instructions, bails out of KGDB, and continues execution from the
> previous PC address.
>
> However, this led to an endless loop between the int3 instruction and
> kgdb_skipexception since the int3 instruction (being still present)
> triggered again. This effectively caused the system to hang.
>
> With this patch, we try to remove the concerned spurious int3
> instruction in kgdb_skipexception before continuing execution. This
> may take a few attempts until the concurrent holders of the text_mutex
> have released it, but eventually succeeds and the kernel can continue.
What guarantees the release of text mutex?
> Signed-off-by: Florian Rommel <mail@...rommel.de>
> ---
> arch/x86/kernel/kgdb.c | 24 ++++++++++++++++++++++++
> 1 file changed, 24 insertions(+)
>
> diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
> index 64c332151af7..585a7a72af74 100644
> --- a/arch/x86/kernel/kgdb.c
> +++ b/arch/x86/kernel/kgdb.c
> @@ -723,7 +723,31 @@ void kgdb_arch_exit(void)
> int kgdb_skipexception(int exception, struct pt_regs *regs)
Btw, kgdb_skipexception() is a gross nisnomer because it rewinds the
instruction pointer to the exception address and does not skip anything,
but that's an orthogonal issue though it could be cleaned up along the
way...
> {
> if (exception == 3 && kgdb_isremovedbreak(regs->ip - 1)) {
> + struct kgdb_bkpt *bpt;
> + int i, error;
> +
> regs->ip -= 1;
> +
> + /*
> + * Try to remove the spurious int3 instruction.
> + * These int3s can result from failed breakpoint removals
> + * in kgdb_arch_remove_breakpoint.
> + */
> + for (bpt = NULL, i = 0; i < KGDB_MAX_BREAKPOINTS; i++) {
> + if (kgdb_break[i].bpt_addr == regs->ip &&
> + kgdb_break[i].state == BP_REMOVED &&
> + (kgdb_break[i].type == BP_BREAKPOINT ||
> + kgdb_break[i].type == BP_POKE_BREAKPOINT)) {
> + bpt = &kgdb_break[i];
> + break;
> + }
> + }
Seriously? The KGBD core already walked that array in
kgdb_isremovedbreak() just so you can walk it again here.
struct kkgdb_bkpt *kgdb_get_removed_breakpoint(unsigned long addr)
{
struct kgdb_bkpt *bp = kgdb_break;
for (int i = 0; i < KGDB_MAX_BREAKPOINTS; i++, bp++) {
if (bp>.state == BP_REMOVED && bp->kgdb_bpt_addr == addr)
return bp;
}
return NULL;
}
bool kgdb_isremovedbreak(unsigned long addr)
{
return !!kgdb_get_removed_breakpoint(addr);
}
bool kgdb_rewind_exception(int exception, struct pt_regs *regs)
{
struct kgdb_bkpt *bp;
if (exception != 3)
return false;
bp = kgdb_get_removed_breakpoint(--regs->ip);
if (!bp || !bp->type == BP_BREAKPOINT)
return false;
Hmm?
> + error = kgdb_arch_remove_breakpoint(bpt);
> + if (error)
> + pr_err("skipexception: breakpoint remove failed: %lx\n",
> + bpt->bpt_addr);
Lacks curly brackets. See Documentation.
return !error;
Aside of that the same problem exists on PowerPC. So you can move the
attempt to remove the breakpoint into the generic code, no?
Thanks,
tglx
Powered by blists - more mailing lists