[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1636158295.cxlln6r0kk.astroid@bobo.none>
Date: Sat, 06 Nov 2021 10:26:11 +1000
From: Nicholas Piggin <npiggin@...il.com>
To: John Ogness <john.ogness@...utronix.de>,
Petr Mladek <pmladek@...e.com>
Cc: Laurent Dufour <ldufour@...ux.ibm.com>,
linux-kernel@...r.kernel.org
Subject: Re: Removal of printk safe buffers delays NMI context printk
Excerpts from John Ogness's message of November 6, 2021 2:44 am:
> On 2021-11-05, Petr Mladek <pmladek@...e.com> wrote:
>> On Fri 2021-11-05 15:03:27, John Ogness wrote:
>>> On 2021-11-05, Nicholas Piggin <npiggin@...il.com> wrote:
>>>> but we do need that printk flush capability back there and for
>>>> nmi_backtrace.
>>>
>>> Agreed. I had not considered this necessary side-effect when I
>>> removed the NMI safe buffers.
>>
>> Honestly, I do not understand why it stopped working or how
>> it worked before.
>
> IIUC, Nick is presenting a problem where a lockup on the other CPUs is
> detected. Those CPUs will dump their backtraces per NMI context. But in
> their lockup state the irq_work for those CPUs is not functional. So
> even though the messages are in the buffer, there is no one printing the
> buffer.
Yep.
> printk_safe_flush() would dump the NMI safe buffers for all the CPUs
> into the printk buffer, then trigger an irq_work on itself (the
> non-locked-up CPU).
>
> That irq_work trigger was critical, because the other CPUs (which also
> triggered irq_works for themselves) aren't able to process irq_works. I
> did not consider this case. Which is why we still need to trigger
> irq_work here. (Or, as the removed comment hinted at, add some printk()
> call to either directly print or trigger the irq_work.)
This patch seems to work, I can submit it if you'd like?
Thanks,
Nick
---
arch/powerpc/kernel/watchdog.c | 6 ++++++
include/linux/printk.h | 4 ++++
kernel/printk/printk.c | 5 +++++
lib/nmi_backtrace.c | 6 ++++++
4 files changed, 21 insertions(+)
diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index 5f69ba4de1f3..c8017bc23b00 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -227,6 +227,12 @@ static void watchdog_smp_panic(int cpu)
cpumask_clear(&wd_smp_cpus_ipi);
}
+ /*
+ * Force flush any remote buffers that might be stuck in IRQ context
+ * and therefore could not run their irq_work.
+ */
+ printk_trigger_flush();
+
if (hardlockup_panic)
nmi_panic(NULL, "Hard LOCKUP");
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 85b656f82d75..9497f6b98339 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -198,6 +198,7 @@ void dump_stack_print_info(const char *log_lvl);
void show_regs_print_info(const char *log_lvl);
extern asmlinkage void dump_stack_lvl(const char *log_lvl) __cold;
extern asmlinkage void dump_stack(void) __cold;
+void printk_trigger_flush(void);
#else
static inline __printf(1, 0)
int vprintk(const char *s, va_list args)
@@ -274,6 +275,9 @@ static inline void dump_stack_lvl(const char *log_lvl)
static inline void dump_stack(void)
{
}
+static inline void printk_trigger_flush(void)
+{
+}
#endif
#ifdef CONFIG_SMP
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index a8d0a58deebc..99221b016c68 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3252,6 +3252,11 @@ void defer_console_output(void)
preempt_enable();
}
+void printk_trigger_flush(void)
+{
+ defer_console_output();
+}
+
int vprintk_deferred(const char *fmt, va_list args)
{
int r;
diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index f9e89001b52e..199ab201d501 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -75,6 +75,12 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
touch_softlockup_watchdog();
}
+ /*
+ * Force flush any remote buffers that might be stuck in IRQ context
+ * and therefore could not run their irq_work.
+ */
+ printk_trigger_flush();
+
clear_bit_unlock(0, &backtrace_flag);
put_cpu();
}
--
2.23.0
Powered by blists - more mailing lists