lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240903163355.3187-1-glaubitz@physik.fu-berlin.de>
Date: Tue,  3 Sep 2024 18:33:55 +0200
From: John Paul Adrian Glaubitz <glaubitz@...sik.fu-berlin.de>
To: feng.tang@...el.com
Cc: akpm@...ux-foundation.org,
	bristot@...hat.com,
	bsegall@...gle.com,
	dietmar.eggemann@....com,
	juri.lelli@...hat.com,
	linux-kernel@...r.kernel.org,
	mgorman@...e.de,
	mingo@...hat.com,
	peterz@...radead.org,
	rostedt@...dmis.org,
	vbabka@...e.cz,
	vincent.guittot@...aro.org,
	vschneid@...hat.com,
	sparclinux@...r.kernel.org
Subject: Re: sched/debug: Dump end of stack when detected corrupted

Hi Feng,

> When debugging a kernel hang during suspend/resume, there are random
> memory corruptions in different places like being detected by scheduler
> with error message:
> 
>   "Kernel panic - not syncing: corrupted stack end detected inside scheduler"
> 
> Dump the corrupted memory around the stack end will give more direct
> hints about how the memory is corrupted:
> 
>  "
>  Corrupted Stack: ff11000122770000: ff ff ff ff ff ff 14 91 82 3b 78 e8 08 00 45 00  .........;x...E.
>  Corrupted Stack: ff11000122770010: 00 1d 2a ff 40 00 40 11 98 c8 0a ef 30 2c 0a ef  ..*.@.@.....0,..
>  Corrupted Stack: ff11000122770020: 30 ff a2 00 22 3d 00 09 9a 95 2a 00 00 00 00 00  0..."=....*.....
>  ...
>  Kernel panic - not syncing: corrupted stack end detected inside scheduler
>  "
> 
> And with it, the culprit was quickly identified to be an ethernet
> driver with its DMA operations.
> 
> Signed-off-by: Feng Tang <feng.tang@...el.com>
> ---
>  kernel/sched/core.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index a795e030678c..1280f7012bc5 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5949,8 +5949,18 @@ static noinline void __schedule_bug(struct task_struct *prev)
>  static inline void schedule_debug(struct task_struct *prev, bool preempt)
>  {
>  #ifdef CONFIG_SCHED_STACK_END_CHECK
> -	if (task_stack_end_corrupted(prev))
> +	if (task_stack_end_corrupted(prev)) {
> +		unsigned long *ptr = end_of_stack(prev);
> +
> +		/* Dump 16 ulong words around the corruption point */
> +#ifdef CONFIG_STACK_GROWSUP
> +		ptr -= 15;
> +#endif
> +		print_hex_dump(KERN_ERR, "Corrupted Stack: ",
> +			DUMP_PREFIX_ADDRESS, 16, 1, ptr, 16 * sizeof(*ptr), 1);
> +
>  		panic("corrupted stack end detected inside scheduler\n");
> +	}
>  
>  	if (task_scs_end_corrupted(prev))
>  		panic("corrupted shadow stack detected inside scheduler\n");

Have you gotten any feedback on this? Would be nice to get this merged as we're
seeing crashes due to stack corruption on sparc from time to time and having the
end of the stack dumped in such cases would make debugging here a bit easier.

Thanks,
Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer
`. `'   Physicist
  `-    GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ