[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20231219032254.96685-1-feng.tang@intel.com>
Date: Tue, 19 Dec 2023 11:22:54 +0800
From: Feng Tang <feng.tang@...el.com>
To: Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>
Cc: Ben Segall <bsegall@...gle.com>,
Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Valentin Schneider <vschneid@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Vlastimil Babka <vbabka@...e.cz>,
linux-kernel@...r.kernel.org,
Feng Tang <feng.tang@...el.com>
Subject: [PATCH] sched/debug: Dump end of stack when detected corrupted
When debugging a kernel hang during suspend/resume, there are random
memory corruptions in different places like being detected by scheduler
with error message:
"Kernel panic - not syncing: corrupted stack end detected inside scheduler"
Dump the corrupted memory around the stack end will give more direct
hints about how the memory is corrupted:
"
Corrupted Stack: ff11000122770000: ff ff ff ff ff ff 14 91 82 3b 78 e8 08 00 45 00 .........;x...E.
Corrupted Stack: ff11000122770010: 00 1d 2a ff 40 00 40 11 98 c8 0a ef 30 2c 0a ef ..*.@.@.....0,..
Corrupted Stack: ff11000122770020: 30 ff a2 00 22 3d 00 09 9a 95 2a 00 00 00 00 00 0..."=....*.....
...
Kernel panic - not syncing: corrupted stack end detected inside scheduler
"
And with it, the culprit was quickly identified to be an ethernet
driver with its DMA operations.
Signed-off-by: Feng Tang <feng.tang@...el.com>
---
kernel/sched/core.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a795e030678c..1280f7012bc5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5949,8 +5949,18 @@ static noinline void __schedule_bug(struct task_struct *prev)
static inline void schedule_debug(struct task_struct *prev, bool preempt)
{
#ifdef CONFIG_SCHED_STACK_END_CHECK
- if (task_stack_end_corrupted(prev))
+ if (task_stack_end_corrupted(prev)) {
+ unsigned long *ptr = end_of_stack(prev);
+
+ /* Dump 16 ulong words around the corruption point */
+#ifdef CONFIG_STACK_GROWSUP
+ ptr -= 15;
+#endif
+ print_hex_dump(KERN_ERR, "Corrupted Stack: ",
+ DUMP_PREFIX_ADDRESS, 16, 1, ptr, 16 * sizeof(*ptr), 1);
+
panic("corrupted stack end detected inside scheduler\n");
+ }
if (task_scs_end_corrupted(prev))
panic("corrupted shadow stack detected inside scheduler\n");
--
2.27.0
Powered by blists - more mailing lists