linux-kernel - Re: [PATCH] printk: CPU backtrace not printing on panic

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240729114601.176047-1-takakura@valinux.co.jp>
Date: Mon, 29 Jul 2024 20:46:01 +0900
From: takakura@...inux.co.jp
To: pmladek@...e.com,
	john.ogness@...utronix.de
Cc: rostedt@...dmis.org,
	senozhatsky@...omium.org,
	akpm@...ux-foundation.org,
	bhe@...hat.com,
	lukas@...ner.de,
	wangkefeng.wang@...wei.com,
	ubizjak@...il.com,
	feng.tang@...el.com,
	j.granados@...sung.com,
	stephen.s.brennan@...cle.com,
	linux-kernel@...r.kernel.org,
	nishimura@...inux.co.jp,
	taka@...inux.co.jp
Subject: Re: [PATCH] printk: CPU backtrace not printing on panic

Hi Petr and John,

Thanks for the feeedback!
Sorry that I was not keeping up with the recent nbcon updates and I 
don't think the patch I sent was reflecting them.

On 2024-07-26, John Ogness wrote:
>On 2024-07-26, Petr Mladek <pmladek@...e.com> wrote:
>> I would do it the other way and enable printing from other CPUs only
>> when triggring the backtrace. We could do it because
>> trigger_all_cpu_backtrace() waits until all backtraces are
>> printed.
>>

Yes, I think doing it other way around works better filtering out 
the non-panicked CPU's irrelevant messages as it does now.
As suggested, it also makes it simple as it removes the need for 
what I suggested (bringing back commit 13fb0f74d702 ("printk: Avoid 
livelock with heavy printk during panic")) which intention was 
to check wheather to trigger backtrace or not based on those 
irrelevant messages.

>> Something like:
>>
>> diff --git a/include/linux/panic.h b/include/linux/panic.h
>> index 3130e0b5116b..980bacbdfcfc 100644
>> --- a/include/linux/panic.h
>> +++ b/include/linux/panic.h
>> @@ -16,6 +16,7 @@ extern void oops_enter(void);
>>  extern void oops_exit(void);
>>  extern bool oops_may_print(void);
>>  
>> +extern int panic_triggering_all_cpu_backtrace;
>>  extern int panic_timeout;
>>  extern unsigned long panic_print;
>>  extern int panic_on_oops;
>> diff --git a/kernel/panic.c b/kernel/panic.c
>> index f861bedc1925..7e9e97d59b1e 100644
>> --- a/kernel/panic.c
>> +++ b/kernel/panic.c
>> @@ -64,6 +64,8 @@ unsigned long panic_on_taint;
>>  bool panic_on_taint_nousertaint = false;
>>  static unsigned int warn_limit __read_mostly;
>>  
>> +int panic_triggering_all_cpu_backtrace;
>> +
>>  int panic_timeout = CONFIG_PANIC_TIMEOUT;
>>  EXPORT_SYMBOL_GPL(panic_timeout);
>>  
>> @@ -253,8 +255,12 @@ void check_panic_on_warn(const char *origin)
>>   */
>>  static void panic_other_cpus_shutdown(bool crash_kexec)
>>  {
>> -	if (panic_print & PANIC_PRINT_ALL_CPU_BT)
>> +	if (panic_print & PANIC_PRINT_ALL_CPU_BT) {
>> +		/* Temporary allow printing messages on non-panic CPUs. */
>> +		panic_triggering_all_cpu_backtrace = true;
>>  		trigger_all_cpu_backtrace();
>> +		panic_triggering_all_cpu_backtrace = false;
>
>Note, here we should also add
>
>		nbcon_atomic_flush_pending();
>
>Your suggestion allows the other CPUs to dump their backtrace into the
>ringbuffer, but they are still forbidden from acquiring the nbcon
>console contexts for printing. That is a necessary requirement of
>nbcon_waiter_matches().

Thanks for pointing it out. I see that it can only allows to write into 
the ringbuffer and printing also needs to be taken care as well.

>Or since the cpu_sync is held while printing the backtrace, we could
>allow the non-panic CPUs to print by modifying the check in
>nbcon_context_try_acquire_direct():
>
>----- BEGIN -----
>diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
>index ef6e76db0f5a..cd8724840edc 100644
>--- a/kernel/printk/nbcon.c
>+++ b/kernel/printk/nbcon.c
>@@ -241,7 +241,7 @@ static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
> 	struct nbcon_state new;
> 
> 	do {
>-		if (other_cpu_in_panic())
>+		if (other_cpu_in_panic() && !__printk_cpu_sync_owner())
> 			return -EPERM;
> 
> 		if (ctxt->prio <= cur->prio || ctxt->prio <= cur->req_prio)
>diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
>index 613644c55e54..f486d0b84a84 100644
>--- a/kernel/printk/printk.c
>+++ b/kernel/printk/printk.c
>@@ -4569,6 +4569,11 @@ void __printk_cpu_sync_wait(void)
> }
> EXPORT_SYMBOL(__printk_cpu_sync_wait);
> 
>+bool __printk_cpu_sync_owner(void)
>+{
>+	return (atomic_read(&printk_cpu_sync_owner) == raw_smp_processor_id());
>+}
>+
> /**
>  * __printk_cpu_sync_try_get() - Try to acquire the printk cpu-reentrant
>  *                               spinning lock.
>----- END -----
>
>> +	}
>>  
>>  	/*
>>  	 * Note that smp_send_stop() is the usual SMP shutdown function,
>> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
>> index 054c0e7784fd..c22b07049c38 100644
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -2316,7 +2316,7 @@ asmlinkage int vprintk_emit(int facility, int level,
>>  	 * non-panic CPUs are generating any messages, they will be
>>  	 * silently dropped.
>>  	 */
>> -	if (other_cpu_in_panic())
>> +	if (other_cpu_in_panic() && !panic_triggering_all_cpu_backtrace)
>>  		return 0;
>I wonder if it is enough to check if it is holding the cpu_sync. Then we
>would not need @panic_triggering_all_cpu_backtrace.

Taking a look at the cpu_sync, it seems CPU backtrace is the only one holding 
the cpu_sync after panic. If so, I also think that it can replace 
@panic_triggering_all_cpu_backtrace.

But I thought that we still need @panic_triggering_all_cpu_backtrace
to let non-panic CPUs writing their backtrace to ringbuffer if we were to 
use cpu_sync to pass the check within nbcon_context_try_acquire_direct().
Or can we use cpu_sync for checking wheather non-panicked CPUs can write to 
ringbuffer and let nbcon_atomic_flush_pending() do the printing something like:

----- BEGIN -----
diff --git a/kernel/panic.c b/kernel/panic.c
index 7e207092576b..eed76e3e061b 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -252,8 +252,10 @@ void check_panic_on_warn(const char *origin)
  */
 static void panic_other_cpus_shutdown(bool crash_kexec)
 {
-	if (panic_print & PANIC_PRINT_ALL_CPU_BT)
+	if (panic_print & PANIC_PRINT_ALL_CPU_BT) {
 		trigger_all_cpu_backtrace();
+		nbcon_atomic_flush_pending();
+	}

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index d0bff0b0abfd..b8132801ea07 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2354,7 +2354,7 @@ asmlinkage int vprintk_emit(int facility, int level,
 	 * non-panic CPUs are generating any messages, they will be
 	 * silently dropped.
 	 */
-	if (other_cpu_in_panic())
+	if (other_cpu_in_panic() && !__printk_cpu_sync_owner())
 		return 0;
 
 	if (level == LOGLEVEL_SCHED) {
@@ -4511,6 +4511,11 @@ void __printk_cpu_sync_wait(void)
 }
 EXPORT_SYMBOL(__printk_cpu_sync_wait);
 
+bool __printk_cpu_sync_owner(void)
+{
+	return (atomic_read(&printk_cpu_sync_owner) == raw_smp_processor_id());
+}
+
----- END -----


>John

Sincerely,
Ryo Takakura