[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240729114601.176047-1-takakura@valinux.co.jp>
Date: Mon, 29 Jul 2024 20:46:01 +0900
From: takakura@...inux.co.jp
To: pmladek@...e.com,
john.ogness@...utronix.de
Cc: rostedt@...dmis.org,
senozhatsky@...omium.org,
akpm@...ux-foundation.org,
bhe@...hat.com,
lukas@...ner.de,
wangkefeng.wang@...wei.com,
ubizjak@...il.com,
feng.tang@...el.com,
j.granados@...sung.com,
stephen.s.brennan@...cle.com,
linux-kernel@...r.kernel.org,
nishimura@...inux.co.jp,
taka@...inux.co.jp
Subject: Re: [PATCH] printk: CPU backtrace not printing on panic
Hi Petr and John,
Thanks for the feeedback!
Sorry that I was not keeping up with the recent nbcon updates and I
don't think the patch I sent was reflecting them.
On 2024-07-26, John Ogness wrote:
>On 2024-07-26, Petr Mladek <pmladek@...e.com> wrote:
>> I would do it the other way and enable printing from other CPUs only
>> when triggring the backtrace. We could do it because
>> trigger_all_cpu_backtrace() waits until all backtraces are
>> printed.
>>
Yes, I think doing it other way around works better filtering out
the non-panicked CPU's irrelevant messages as it does now.
As suggested, it also makes it simple as it removes the need for
what I suggested (bringing back commit 13fb0f74d702 ("printk: Avoid
livelock with heavy printk during panic")) which intention was
to check wheather to trigger backtrace or not based on those
irrelevant messages.
>> Something like:
>>
>> diff --git a/include/linux/panic.h b/include/linux/panic.h
>> index 3130e0b5116b..980bacbdfcfc 100644
>> --- a/include/linux/panic.h
>> +++ b/include/linux/panic.h
>> @@ -16,6 +16,7 @@ extern void oops_enter(void);
>> extern void oops_exit(void);
>> extern bool oops_may_print(void);
>>
>> +extern int panic_triggering_all_cpu_backtrace;
>> extern int panic_timeout;
>> extern unsigned long panic_print;
>> extern int panic_on_oops;
>> diff --git a/kernel/panic.c b/kernel/panic.c
>> index f861bedc1925..7e9e97d59b1e 100644
>> --- a/kernel/panic.c
>> +++ b/kernel/panic.c
>> @@ -64,6 +64,8 @@ unsigned long panic_on_taint;
>> bool panic_on_taint_nousertaint = false;
>> static unsigned int warn_limit __read_mostly;
>>
>> +int panic_triggering_all_cpu_backtrace;
>> +
>> int panic_timeout = CONFIG_PANIC_TIMEOUT;
>> EXPORT_SYMBOL_GPL(panic_timeout);
>>
>> @@ -253,8 +255,12 @@ void check_panic_on_warn(const char *origin)
>> */
>> static void panic_other_cpus_shutdown(bool crash_kexec)
>> {
>> - if (panic_print & PANIC_PRINT_ALL_CPU_BT)
>> + if (panic_print & PANIC_PRINT_ALL_CPU_BT) {
>> + /* Temporary allow printing messages on non-panic CPUs. */
>> + panic_triggering_all_cpu_backtrace = true;
>> trigger_all_cpu_backtrace();
>> + panic_triggering_all_cpu_backtrace = false;
>
>Note, here we should also add
>
> nbcon_atomic_flush_pending();
>
>Your suggestion allows the other CPUs to dump their backtrace into the
>ringbuffer, but they are still forbidden from acquiring the nbcon
>console contexts for printing. That is a necessary requirement of
>nbcon_waiter_matches().
Thanks for pointing it out. I see that it can only allows to write into
the ringbuffer and printing also needs to be taken care as well.
>Or since the cpu_sync is held while printing the backtrace, we could
>allow the non-panic CPUs to print by modifying the check in
>nbcon_context_try_acquire_direct():
>
>----- BEGIN -----
>diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
>index ef6e76db0f5a..cd8724840edc 100644
>--- a/kernel/printk/nbcon.c
>+++ b/kernel/printk/nbcon.c
>@@ -241,7 +241,7 @@ static int nbcon_context_try_acquire_direct(struct nbcon_context *ctxt,
> struct nbcon_state new;
>
> do {
>- if (other_cpu_in_panic())
>+ if (other_cpu_in_panic() && !__printk_cpu_sync_owner())
> return -EPERM;
>
> if (ctxt->prio <= cur->prio || ctxt->prio <= cur->req_prio)
>diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
>index 613644c55e54..f486d0b84a84 100644
>--- a/kernel/printk/printk.c
>+++ b/kernel/printk/printk.c
>@@ -4569,6 +4569,11 @@ void __printk_cpu_sync_wait(void)
> }
> EXPORT_SYMBOL(__printk_cpu_sync_wait);
>
>+bool __printk_cpu_sync_owner(void)
>+{
>+ return (atomic_read(&printk_cpu_sync_owner) == raw_smp_processor_id());
>+}
>+
> /**
> * __printk_cpu_sync_try_get() - Try to acquire the printk cpu-reentrant
> * spinning lock.
>----- END -----
>
>> + }
>>
>> /*
>> * Note that smp_send_stop() is the usual SMP shutdown function,
>> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
>> index 054c0e7784fd..c22b07049c38 100644
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -2316,7 +2316,7 @@ asmlinkage int vprintk_emit(int facility, int level,
>> * non-panic CPUs are generating any messages, they will be
>> * silently dropped.
>> */
>> - if (other_cpu_in_panic())
>> + if (other_cpu_in_panic() && !panic_triggering_all_cpu_backtrace)
>> return 0;
>I wonder if it is enough to check if it is holding the cpu_sync. Then we
>would not need @panic_triggering_all_cpu_backtrace.
Taking a look at the cpu_sync, it seems CPU backtrace is the only one holding
the cpu_sync after panic. If so, I also think that it can replace
@panic_triggering_all_cpu_backtrace.
But I thought that we still need @panic_triggering_all_cpu_backtrace
to let non-panic CPUs writing their backtrace to ringbuffer if we were to
use cpu_sync to pass the check within nbcon_context_try_acquire_direct().
Or can we use cpu_sync for checking wheather non-panicked CPUs can write to
ringbuffer and let nbcon_atomic_flush_pending() do the printing something like:
----- BEGIN -----
diff --git a/kernel/panic.c b/kernel/panic.c
index 7e207092576b..eed76e3e061b 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -252,8 +252,10 @@ void check_panic_on_warn(const char *origin)
*/
static void panic_other_cpus_shutdown(bool crash_kexec)
{
- if (panic_print & PANIC_PRINT_ALL_CPU_BT)
+ if (panic_print & PANIC_PRINT_ALL_CPU_BT) {
trigger_all_cpu_backtrace();
+ nbcon_atomic_flush_pending();
+ }
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index d0bff0b0abfd..b8132801ea07 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2354,7 +2354,7 @@ asmlinkage int vprintk_emit(int facility, int level,
* non-panic CPUs are generating any messages, they will be
* silently dropped.
*/
- if (other_cpu_in_panic())
+ if (other_cpu_in_panic() && !__printk_cpu_sync_owner())
return 0;
if (level == LOGLEVEL_SCHED) {
@@ -4511,6 +4511,11 @@ void __printk_cpu_sync_wait(void)
}
EXPORT_SYMBOL(__printk_cpu_sync_wait);
+bool __printk_cpu_sync_owner(void)
+{
+ return (atomic_read(&printk_cpu_sync_owner) == raw_smp_processor_id());
+}
+
----- END -----
>John
Sincerely,
Ryo Takakura
Powered by blists - more mailing lists