[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20190730221800.28326-1-dianders@chromium.org>
Date: Tue, 30 Jul 2019 15:18:00 -0700
From: Douglas Anderson <dianders@...omium.org>
To: Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will@...nel.org>,
Jason Wessel <jason.wessel@...driver.com>,
Daniel Thompson <daniel.thompson@...aro.org>
Cc: kgdb-bugreport@...ts.sourceforge.net,
Douglas Anderson <dianders@...omium.org>,
linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
Christophe Leroy <christophe.leroy@....fr>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Mark Rutland <mark.rutland@....com>,
linux-arm-kernel@...ts.infradead.org
Subject: [PATCH] arm64: debug: Make 'btc' and similar work in kdb
In kdb when you do 'btc' (back trace on CPU) it doesn't give you the
right info. This can be seen by this:
echo SOFTLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
# wait 2 seconds
<sysrq>g
Here's what I see now on rk3399-gru-kevin. I see the stack crawl for
the CPU that handled the sysrq but everything else just shows me stuck
in __switch_to() which is bogus:
======
[0]kdb> btc
btc: cpu status: Currently on cpu 0
Available cpus: 0, 1-3(I), 4, 5(I)
Stack traceback for pid 0
0xffffff801101a9c0 0 0 1 0 R 0xffffff801101b3b0 *swapper/0
Call trace:
dump_backtrace+0x0/0x138
...
kgdb_compiled_brk_fn+0x34/0x44
...
sysrq_handle_dbg+0x34/0x5c
Stack traceback for pid 0
0xffffffc0f175a040 0 0 1 1 I 0xffffffc0f175aa30 swapper/1
Call trace:
__switch_to+0x1e4/0x240
0xffffffc0f65616c0
Stack traceback for pid 0
0xffffffc0f175d040 0 0 1 2 I 0xffffffc0f175da30 swapper/2
Call trace:
__switch_to+0x1e4/0x240
0xffffffc0f65806c0
Stack traceback for pid 0
0xffffffc0f175b040 0 0 1 3 I 0xffffffc0f175ba30 swapper/3
Call trace:
__switch_to+0x1e4/0x240
0xffffffc0f659f6c0
Stack traceback for pid 1474
0xffffffc0dde8b040 1474 727 1 4 R 0xffffffc0dde8ba30 bash
Call trace:
__switch_to+0x1e4/0x240
__schedule+0x464/0x618
0xffffffc0dde8b040
Stack traceback for pid 0
0xffffffc0f17b0040 0 0 1 5 I 0xffffffc0f17b0a30 swapper/5
Call trace:
__switch_to+0x1e4/0x240
0xffffffc0f65dd6c0
===
The problem is that 'btc' eventually boils down to
show_stack(task_struct, NULL);
...and show_stack() doesn't work for "running" CPUs because their
registers haven't been stashed.
On x86 things might work better (I haven't tested) because kdb has a
special case for x86 in kdb_show_stack() where it passes the stack
pointer to show_stack(). This wouldn't work on arm64 where the stack
crawling function seems needs the "fp" and "pc", not the "sp" which is
presumably why arm64's show_stack() function totally ignores the "sp"
parameter.
NOTE: we _can_ get a good stack dump for all the cpus if we manually
switch each one to the kdb master and do a back trace. AKA:
cpu 4
bt
...will give the expected trace. That's because now arm64's
dump_backtrace will now see that "tsk == current" and go through a
different path.
In this patch I fix the problems by stashing the "pt_regs" into the
"cpu_context" when a CPU enters the debugger. Now all the normal stack
crawling code will be able to find it. This is possible because:
* When a task is "running" nobody else is using the "cpu_context"
* The task isn't really "running" (it's in the debugger) so there are
actually some sane registers to save.
This patch works without any extra kgdb API changes by just
implementing the weak kgdb_call_nmi_hook(). I don't try to address
the existing caveat in kgdb_call_nmi_hook() around pt_regs, so I copy
the comment from the generic code.
After this patch the same test shows much more sane stack crawls. The
idle tasks now show:
Stack traceback for pid 0
0xffffffc0f175b040 0 0 1 3 I 0xffffffc0f175ba30 swapper/3
Call trace:
cpuidle_enter_state+0x284/0x428
cpuidle_enter+0x38/0x4c
do_idle+0x168/0x29c
cpu_startup_entry+0x24/0x28
secondary_start_kernel+0x140/0x14c
...and the locked task:
Stack traceback for pid 1603
0xffffffc0d98c7040 1603 724 1 4 R 0xffffffc0d98c7a30 bash
Call trace:
lkdtm_SOFTLOCKUP+0x1c/0x24
lkdtm_do_action+0x24/0x44
direct_entry+0x130/0x178
full_proxy_write+0x60/0xb4
__vfs_write+0x54/0x18c
vfs_write+0xcc/0x174
ksys_write+0x7c/0xe4
__arm64_sys_write+0x20/0x2c
el0_svc_common+0x9c/0x14c
el0_svc_compat_handler+0x28/0x34
el0_svc_compat+0x8/0x10
Signed-off-by: Douglas Anderson <dianders@...omium.org>
---
arch/arm64/kernel/kgdb.c | 39 +++++++++++++++++++++++++++++++++++++++
1 file changed, 39 insertions(+)
diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
index 43119922341f..b666210fbc75 100644
--- a/arch/arm64/kernel/kgdb.c
+++ b/arch/arm64/kernel/kgdb.c
@@ -148,6 +148,45 @@ sleeping_thread_to_gdb_regs(unsigned long *gdb_regs, struct task_struct *task)
gdb_regs[32] = cpu_context->pc;
}
+void kgdb_call_nmi_hook(void *ignored)
+{
+ struct pt_regs *regs;
+
+ /*
+ * NOTE: get_irq_regs() is supposed to get the registers from
+ * before the IPI interrupt happened and so is supposed to
+ * show where the processor was. In some situations it's
+ * possible we might be called without an IPI, so it might be
+ * safer to figure out how to make kgdb_breakpoint() work
+ * properly here.
+ */
+ regs = get_irq_regs();
+
+ /*
+ * Some commands (like 'btc') assume that they can find info about
+ * a task in the 'cpu_context'. Unfortunately that's only valid
+ * for sleeping tasks. ...but let's make it work anyway by just
+ * writing the registers to the right place. This is safe because
+ * nobody else is using the 'cpu_context' for a running task.
+ */
+ current->thread.cpu_context.x19 = regs->regs[19];
+ current->thread.cpu_context.x20 = regs->regs[20];
+ current->thread.cpu_context.x21 = regs->regs[21];
+ current->thread.cpu_context.x22 = regs->regs[22];
+ current->thread.cpu_context.x23 = regs->regs[23];
+ current->thread.cpu_context.x24 = regs->regs[24];
+ current->thread.cpu_context.x25 = regs->regs[25];
+ current->thread.cpu_context.x26 = regs->regs[26];
+ current->thread.cpu_context.x27 = regs->regs[27];
+ current->thread.cpu_context.x28 = regs->regs[28];
+ current->thread.cpu_context.fp = regs->regs[29];
+
+ current->thread.cpu_context.sp = regs->sp;
+ current->thread.cpu_context.pc = regs->pc;
+
+ kgdb_nmicallback(raw_smp_processor_id(), regs);
+}
+
void kgdb_arch_set_pc(struct pt_regs *regs, unsigned long pc)
{
regs->pc = pc;
--
2.22.0.709.g102302147b-goog
Powered by blists - more mailing lists