[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240903163355.3187-1-glaubitz@physik.fu-berlin.de>
Date: Tue, 3 Sep 2024 18:33:55 +0200
From: John Paul Adrian Glaubitz <glaubitz@...sik.fu-berlin.de>
To: feng.tang@...el.com
Cc: akpm@...ux-foundation.org,
bristot@...hat.com,
bsegall@...gle.com,
dietmar.eggemann@....com,
juri.lelli@...hat.com,
linux-kernel@...r.kernel.org,
mgorman@...e.de,
mingo@...hat.com,
peterz@...radead.org,
rostedt@...dmis.org,
vbabka@...e.cz,
vincent.guittot@...aro.org,
vschneid@...hat.com,
sparclinux@...r.kernel.org
Subject: Re: sched/debug: Dump end of stack when detected corrupted
Hi Feng,
> When debugging a kernel hang during suspend/resume, there are random
> memory corruptions in different places like being detected by scheduler
> with error message:
>
> "Kernel panic - not syncing: corrupted stack end detected inside scheduler"
>
> Dump the corrupted memory around the stack end will give more direct
> hints about how the memory is corrupted:
>
> "
> Corrupted Stack: ff11000122770000: ff ff ff ff ff ff 14 91 82 3b 78 e8 08 00 45 00 .........;x...E.
> Corrupted Stack: ff11000122770010: 00 1d 2a ff 40 00 40 11 98 c8 0a ef 30 2c 0a ef ..*.@.@.....0,..
> Corrupted Stack: ff11000122770020: 30 ff a2 00 22 3d 00 09 9a 95 2a 00 00 00 00 00 0..."=....*.....
> ...
> Kernel panic - not syncing: corrupted stack end detected inside scheduler
> "
>
> And with it, the culprit was quickly identified to be an ethernet
> driver with its DMA operations.
>
> Signed-off-by: Feng Tang <feng.tang@...el.com>
> ---
> kernel/sched/core.c | 12 +++++++++++-
> 1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index a795e030678c..1280f7012bc5 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5949,8 +5949,18 @@ static noinline void __schedule_bug(struct task_struct *prev)
> static inline void schedule_debug(struct task_struct *prev, bool preempt)
> {
> #ifdef CONFIG_SCHED_STACK_END_CHECK
> - if (task_stack_end_corrupted(prev))
> + if (task_stack_end_corrupted(prev)) {
> + unsigned long *ptr = end_of_stack(prev);
> +
> + /* Dump 16 ulong words around the corruption point */
> +#ifdef CONFIG_STACK_GROWSUP
> + ptr -= 15;
> +#endif
> + print_hex_dump(KERN_ERR, "Corrupted Stack: ",
> + DUMP_PREFIX_ADDRESS, 16, 1, ptr, 16 * sizeof(*ptr), 1);
> +
> panic("corrupted stack end detected inside scheduler\n");
> + }
>
> if (task_scs_end_corrupted(prev))
> panic("corrupted shadow stack detected inside scheduler\n");
Have you gotten any feedback on this? Would be nice to get this merged as we're
seeing crashes due to stack corruption on sparc from time to time and having the
end of the stack dumped in such cases would make debugging here a bit easier.
Thanks,
Adrian
--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer
`. `' Physicist
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913
Powered by blists - more mailing lists