[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aRWqei9jA8gcM-sD@pathway.suse.cz>
Date: Thu, 13 Nov 2025 10:52:58 +0100
From: Petr Mladek <pmladek@...e.com>
To: John Ogness <john.ogness@...utronix.de>
Cc: Sergey Senozhatsky <senozhatsky@...omium.org>,
Steven Rostedt <rostedt@...dmis.org>,
Sherry Sun <sherry.sun@....com>, Jacky Bai <ping.bai@....com>,
Jon Hunter <jonathanh@...dia.com>,
Thierry Reding <thierry.reding@...il.com>,
Derek Barbosa <debarbos@...hat.com>, linux-kernel@...r.kernel.org,
stable@...r.kernel.org
Subject: Re: [PATCH printk v1 1/1] printk: Avoid scheduling irq_work on
suspend
On Wed 2025-11-12 16:56:22, John Ogness wrote:
> On 2025-11-11, Petr Mladek <pmladek@...e.com> wrote:
> >> Introduce a new global variable @console_offload_blocked to flag
> >> when irq_work queueing is to be avoided. The flag is used by
> >> printk_get_console_flush_type() to avoid allowing deferred printing
> >> and switch NBCON consoles to atomic flushing. It is also used by
> >> vprintk_emit() to avoid klogd waking.
> >>
> >> --- a/kernel/printk/internal.h
> >> +++ b/kernel/printk/internal.h
> >> @@ -230,6 +230,8 @@ struct console_flush_type {
> >> bool legacy_offload;
> >> };
> >>
> >> +extern bool console_irqwork_blocked;
> >> +
> >> /*
> >> * Identify which console flushing methods should be used in the context of
> >> * the caller.
> >> @@ -241,7 +243,7 @@ static inline void printk_get_console_flush_type(struct console_flush_type *ft)
> >> switch (nbcon_get_default_prio()) {
> >> case NBCON_PRIO_NORMAL:
> >> if (have_nbcon_console && !have_boot_console) {
> >> - if (printk_kthreads_running)
> >> + if (printk_kthreads_running && !console_irqwork_blocked)
> >> ft->nbcon_offload = true;
> >> else
> >> ft->nbcon_atomic = true;
> >> @@ -251,7 +253,7 @@ static inline void printk_get_console_flush_type(struct console_flush_type *ft)
> >> if (have_legacy_console || have_boot_console) {
> >> if (!is_printk_legacy_deferred())
> >> ft->legacy_direct = true;
> >> - else
> >> + else if (!console_irqwork_blocked)
> >> ft->legacy_offload = true;
> >> }
> >> break;
> >
> > This is one possibility.
> >
> > Another possibility would be to block the irq work
> > directly in defer_console_output() and wake_up_klogd().
> > It would handle all situations, including printk_trigger_flush()
> > or defer_console_output().
> >
> > Or is there any reason, why these two call paths are not handled?
>
> My intention was to focus only on irq_work triggered directly by
> printk() calls as well as transitioning NBCON from threaded to direct.
>
> > I do not have strong opinion. This patch makes it more explicit
> > when defer_console_output() or wake_up_klogd() is called.
> >
> > If we move the check into defer_console_output() or wake_up_klogd(),
> > it would hide the behavior. But it will make the API more safe
> > to use. And wake_up_klogd() is even exported via <linux/printk.h>.
>
> Earlier test versions of this patch did exactly as you are suggesting
> here. But I felt like "properly avoiding" deferred/offloaded printing
> via printk_get_console_flush_type() (rather than just hacking
> irq_work_queue() callers to abort) was a cleaner solution. Especially
> since printk_get_console_flush_type() modifications were needed anyway
> in order to transition NBCON from threaded to direct.
I see.
> >> @@ -264,7 +266,7 @@ static inline void printk_get_console_flush_type(struct console_flush_type *ft)
> >> if (have_legacy_console || have_boot_console) {
> >> if (!is_printk_legacy_deferred())
> >> ft->legacy_direct = true;
> >> - else
> >> + else if (!console_irqwork_blocked)
> >> ft->legacy_offload = true;
> >
> > This change won't be needed if we move the check into
> > defer_console_output() and wake_up_klogd().
> >
> >> }
> >> break;
> >> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> >> index 5aee9ffb16b9a..94fc4a8662d4b 100644
> >> --- a/kernel/printk/printk.c
> >> +++ b/kernel/printk/printk.c
> >> @@ -2426,7 +2429,7 @@ asmlinkage int vprintk_emit(int facility, int level,
> >>
> >> if (ft.legacy_offload)
> >> defer_console_output();
> >> - else
> >> + else if (!console_irqwork_blocked)
> >> wake_up_klogd();
> >
> > Same here.
> >
> >>
> >> return printed_len;
>
> I would prefer to keep all the printk_get_console_flush_type() changes
> since it returns proper available flush type information. If you would
> like to _additionally_ short-circuit __wake_up_klogd() and
> nbcon_kthreads_wake() in order to avoid all possible irq_work queueing,
> I would be OK with that.
Combining both approaches might be a bit messy. Some code paths might
work correctly because of the explicit check and some just by chance.
But I got an idea. We could add a warning into __wake_up_klogd()
and nbcon_kthreads_wake() to report when they are called unexpectedly.
And we should also prevent calling it from lib/nmi_backtrace.c.
I played with it and came up with the following changes on
top of this patch:
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 45c663124c9b..71e31b908ad1 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -203,6 +203,7 @@ void dump_stack_print_info(const char *log_lvl);
void show_regs_print_info(const char *log_lvl);
extern asmlinkage void dump_stack_lvl(const char *log_lvl) __cold;
extern asmlinkage void dump_stack(void) __cold;
+void printk_try_flush(void);
void printk_trigger_flush(void);
void console_try_replay_all(void);
void printk_legacy_allow_panic_sync(void);
@@ -299,6 +300,9 @@ static inline void dump_stack_lvl(const char *log_lvl)
static inline void dump_stack(void)
{
}
+static inline void printk_try_flush(void)
+{
+}
static inline void printk_trigger_flush(void)
{
}
diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index ffd5a2593306..a09b8502e507 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -1302,6 +1302,13 @@ void nbcon_kthreads_wake(void)
if (!printk_kthreads_running)
return;
+ /*
+ * Nobody is allowed to call this function when console irq_work
+ * is blocked.
+ */
+ if (WARN_ON_ONCE(console_irqwork_blocked))
+ return;
+
cookie = console_srcu_read_lock();
for_each_console_srcu(con) {
if (!(console_srcu_read_flags(con) & CON_NBCON))
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 334b4edff08c..e9290c418d12 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -4581,6 +4581,13 @@ static void __wake_up_klogd(int val)
if (!printk_percpu_data_ready())
return;
+ /*
+ * Nobody is allowed to call this function when console irq_work
+ * is blocked.
+ */
+ if (WARN_ON_ONCE(console_irqwork_blocked))
+ return;
+
preempt_disable();
/*
* Guarantee any new records can be seen by tasks preparing to wait
@@ -4637,6 +4644,21 @@ void defer_console_output(void)
__wake_up_klogd(PRINTK_PENDING_WAKEUP | PRINTK_PENDING_OUTPUT);
}
+void printk_try_flush(void)
+{
+ struct console_flush_type ft;
+
+ printk_get_console_flush_type(&ft);
+ if (ft.nbcon_atomic)
+ nbcon_atomic_flush_pending();
+ if (ft.legacy_direct) {
+ if (console_trylock())
+ console_unlock();
+ }
+ if (ft.legacy_offload)
+ defer_console_output();
+}
+
void printk_trigger_flush(void)
{
defer_console_output();
diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index 33c154264bfe..632bbc28cb79 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -78,10 +78,10 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
nmi_backtrace_stall_check(to_cpumask(backtrace_mask));
/*
- * Force flush any remote buffers that might be stuck in IRQ context
+ * Try flushing messages added CPUs which might be stuck in IRQ context
* and therefore could not run their irq_work.
*/
- printk_trigger_flush();
+ printk_try_flush();
clear_bit_unlock(0, &backtrace_flag);
put_cpu();
How does it look, please?
Best Regards,
Petr
Powered by blists - more mailing lists