linux-kernel - Re: [PATCH v4] panic: Fixes the panic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y+ue4OsyrGSx5ujB@alley>
Date:   Tue, 14 Feb 2023 15:46:56 +0100
From:   Petr Mladek <pmladek@...e.com>
To:     "Guilherme G. Piccoli" <gpiccoli@...lia.com>
Cc:     akpm@...ux-foundation.org, bhe@...hat.com,
        linux-kernel@...r.kernel.org, kexec@...ts.infradead.org,
        dyoung@...hat.com, d.hatayama@...fujitsu.com, feng.tang@...el.com,
        hidehiro.kawai.ez@...achi.com, keescook@...omium.org,
        mikelley@...rosoft.com, vgoyal@...hat.com, kernel-dev@...lia.com,
        kernel@...ccoli.net, stable@...r.kernel.org
Subject: Re: [PATCH v4] panic: Fixes the panic_print NMI backtrace setting

On Fri 2023-02-10 17:35:10, Guilherme G. Piccoli wrote:
> Commit 8d470a45d1a6 ("panic: add option to dump all CPUs backtraces in panic_print")
> introduced a setting for the "panic_print" kernel parameter to allow
> users to request a NMI backtrace on panic. Problem is that the panic_print
> handling happens after the secondary CPUs are already disabled, hence
> this option ended-up being kind of a no-op - kernel skips the NMI trace
> in idling CPUs, which is the case of offline CPUs.

Great catch!

> Hi folks, thanks in advance for reviews/comments.
> 
> Notice that while at it, I got rid of the "crash_kexec_post_notifiers"
> local copy in panic(). This was introduced by commit b26e27ddfd2a
> ("kexec: use core_param for crash_kexec_post_notifiers boot option"),
> but it is not clear from comments or commit message why this local copy
> is required.
> 
> My understanding is that it's a mechanism to prevent some concurrency,
> in case some other CPU modify this variable while panic() is running.
> I find it very unlikely, hence I removed it - but if people consider
> this copy needed, I can respin this patch and keep it, even providing a
> comment about that, in order to be explict about its need.

Yes, I think that it makes the behavior consistent even when the
global variable manipulated in parallel.

I would personally prefer to keep the local copy. Better safe
than sorry.

> Let me know your thoughts!
> Cheers,
> 
> Guilherme
> 
> 
>  kernel/panic.c | 47 +++++++++++++++++++++++++++--------------------
>  1 file changed, 27 insertions(+), 20 deletions(-)
> 
> diff --git a/kernel/panic.c b/kernel/panic.c
> index 463c9295bc28..f45ee88be8a2 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -211,9 +211,6 @@ static void panic_print_sys_info(bool console_flush)
>  		return;
>  	}
>  
> -	if (panic_print & PANIC_PRINT_ALL_CPU_BT)
> -		trigger_all_cpu_backtrace();
> -

Sigh, this is yet another PANIC_PRINT_ action that need special
timing. We should handle both the same way.

What about the following? The parameter @mask says what
actions are allowed at the given time.

--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -72,6 +72,9 @@ EXPORT_SYMBOL_GPL(panic_timeout);
 #define PANIC_PRINT_FTRACE_INFO		0x00000010
 #define PANIC_PRINT_ALL_PRINTK_MSG	0x00000020
 #define PANIC_PRINT_ALL_CPU_BT		0x00000040
+/* Filter out actions that need special timing. */
+#define PANIC_PRINT_COMMON_INFO_MASK	~(PANIC_PRINT_ALL_PRINTK_MSG |	 \
+					  PANIC_PRINT_ALL_CPU_BT)
 unsigned long panic_print;
 
 ATOMIC_NOTIFIER_HEAD(panic_notifier_list);
@@ -203,30 +206,29 @@ void nmi_panic(struct pt_regs *regs, const char *msg)
 }
 EXPORT_SYMBOL(nmi_panic);
 
-static void panic_print_sys_info(bool console_flush)
+static void panic_print_sys_info(unsigned long mask)
 {
-	if (console_flush) {
-		if (panic_print & PANIC_PRINT_ALL_PRINTK_MSG)
-			console_flush_on_panic(CONSOLE_REPLAY_ALL);
-		return;
-	}
+	unsigned long panic_print_now = panic_print & mask;
+
+	if (panic_print_now & PANIC_PRINT_ALL_PRINTK_MSG)
+		console_flush_on_panic(CONSOLE_REPLAY_ALL);
 
-	if (panic_print & PANIC_PRINT_ALL_CPU_BT)
+	if (panic_print_now & PANIC_PRINT_ALL_CPU_BT)
 		trigger_all_cpu_backtrace();
 
-	if (panic_print & PANIC_PRINT_TASK_INFO)
+	if (panic_print_now & PANIC_PRINT_TASK_INFO)
 		show_state();
 
-	if (panic_print & PANIC_PRINT_MEM_INFO)
+	if (panic_print_now & PANIC_PRINT_MEM_INFO)
 		show_mem(0, NULL);
 
-	if (panic_print & PANIC_PRINT_TIMER_INFO)
+	if (panic_print_now & PANIC_PRINT_TIMER_INFO)
 		sysrq_timer_list_show();
 
-	if (panic_print & PANIC_PRINT_LOCK_INFO)
+	if (panic_print_now & PANIC_PRINT_LOCK_INFO)
 		debug_show_all_locks();
 
-	if (panic_print & PANIC_PRINT_FTRACE_INFO)
+	if (panic_print_now & PANIC_PRINT_FTRACE_INFO)
 		ftrace_dump(DUMP_ALL);
 }
 
@@ -333,9 +335,12 @@ void panic(const char *fmt, ...)
 	 *
 	 * Bypass the panic_cpu check and call __crash_kexec directly.
 	 */
-	if (!_crash_kexec_post_notifiers) {
+	if (!_crash_kexec_post_notifiers)
 		__crash_kexec(NULL);
 
+	panic_print_sys_info(PANIC_PRINT_ALL_CPU_BT);
+
+	if (!_crash_kexec_post_notifiers) {
 		/*
 		 * Note smp_send_stop is the usual smp shutdown function, which
 		 * unfortunately means it may not be hardened to work in a
@@ -357,7 +362,7 @@ void panic(const char *fmt, ...)
 	 */
 	atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
 
-	panic_print_sys_info(false);
+	panic_print_sys_info(PANIC_PRINT_COMMON_INFO_MASK);
 
 	kmsg_dump(KMSG_DUMP_PANIC);
 
@@ -386,7 +391,7 @@ void panic(const char *fmt, ...)
 	debug_locks_off();
 	console_flush_on_panic(CONSOLE_FLUSH_PENDING);
 
-	panic_print_sys_info(true);
+	panic_print_sys_info(PANIC_PRINT_ALL_PRINTK_MSG);
 
 	if (!panic_blink)
 		panic_blink = no_blink;


Best Regards,
Petr