linux-kernel - Re: [PATCH 1/3] printk/nbcon: Block printk kthreads when any CPU is in an emergency context

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALqELGzia2BObxPuDPiH-3HSDAf+r6HWmpD+iFAr+v22QGwoTw@mail.gmail.com>
Date: Mon, 29 Sep 2025 09:40:03 +0100
From: Andrew Murray <amurray@...goodpenguin.co.uk>
To: Petr Mladek <pmladek@...e.com>
Cc: John Ogness <john.ogness@...utronix.de>, 
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>, Jiri Slaby <jirislaby@...nel.org>, 
	Sergey Senozhatsky <senozhatsky@...omium.org>, Steven Rostedt <rostedt@...dmis.org>, 
	Thomas Gleixner <tglx@...utronix.de>, Esben Haabendal <esben@...nix.com>, linux-serial@...r.kernel.org, 
	linux-kernel@...r.kernel.org, 
	Andy Shevchenko <andriy.shevchenko@...ux.intel.com>, Arnd Bergmann <arnd@...db.de>, 
	Tony Lindgren <tony@...mide.com>, Niklas Schnelle <schnelle@...ux.ibm.com>, 
	Serge Semin <fancer.lancer@...il.com>
Subject: Re: [PATCH 1/3] printk/nbcon: Block printk kthreads when any CPU is
 in an emergency context

On Fri, 26 Sept 2025 at 13:50, Petr Mladek <pmladek@...e.com> wrote:
>
> In emergency contexts, printk() tries to flush messages directly even
> on nbcon consoles. And it is allowed to takeover the console ownership
> and interrupt the printk kthread in the middle of a message.
>
> Only one takeover and one repeated message should be enough in most
> situations. The first emergency message flushes the backlog and printk
> kthreads get to sleep. Next emergency messages are flushed directly
> and printk() does not wake up the kthreads.
>
> However, the one takeover is not guaranteed. Any printk() in normal
> context on another CPU could wake up the kthreads. Or a new emergency
> message might be added before the kthreads get to sleep. Note that
> the interrupted .write_kthread() callbacks usually have to call
> nbcon_reacquire_nobuf() and restore the original device setting
> before checking for pending messages.
>
> The risk of the repeated takeovers will be even bigger because
> __nbcon_atomic_flush_pending_con is going to release the console
> ownership after each emitted record. It will be needed to prevent
> hardlockup reports on other CPUs which are busy waiting for
> the context ownership, for example, by nbcon_reacquire_nobuf() or
> __uart_port_nbcon_acquire().
>
> The repeated takeovers break the output, for example:
>
>     [ 5042.650211][ T2220] Call Trace:
>     [ 5042.6511
>     ** replaying previous printk message **
>     [ 5042.651192][ T2220]  <TASK>
>     [ 5042.652160][ T2220]  kunit_run_
>     ** replaying previous printk message **
>     [ 5042.652160][ T2220]  kunit_run_tests+0x72/0x90
>     [ 5042.653340][ T22
>     ** replaying previous printk message **
>     [ 5042.653340][ T2220]  ? srso_alias_return_thunk+0x5/0xfbef5
>     [ 5042.654628][ T2220]  ? stack_trace_save+0x4d/0x70
>     [ 5042.6553
>     ** replaying previous printk message **
>     [ 5042.655394][ T2220]  ? srso_alias_return_thunk+0x5/0xfbef5
>     [ 5042.656713][ T2220]  ? save_trace+0x5b/0x180
>
> A more robust solution is to block the printk kthread entirely whenever
> *any* CPU enters an emergency context. This ensures that critical messages
> can be flushed without contention from the normal, non-atomic printing
> path.
>
> Link: https://lore.kernel.org/all/aNQO-zl3k1l4ENfy@pathway.suse.cz
> Signed-off-by: Petr Mladek <pmladek@...e.com>
> ---
>  kernel/printk/nbcon.c | 32 +++++++++++++++++++++++++++++++-
>  1 file changed, 31 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
> index d5d8c8c657e0..08b196e898cd 100644
> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -117,6 +117,9 @@
>   * from scratch.
>   */
>
> +/* Counter of active nbcon emergency contexts. */
> +atomic_t nbcon_cpu_emergency_cnt;
> +
>  /**
>   * nbcon_state_set - Helper function to set the console state
>   * @con:       Console to update
> @@ -1168,6 +1171,16 @@ static bool nbcon_kthread_should_wakeup(struct console *con, struct nbcon_contex
>         if (kthread_should_stop())
>                 return true;
>
> +       /*
> +        * Block the kthread when the system is in an emergency or panic mode.
> +        * It increases the chance that these contexts would be able to show
> +        * the messages directly. And it reduces the risk of interrupted writes
> +        * where the context with a higher priority takes over the nbcon console
> +        * ownership in the middle of a message.
> +        */
> +       if (unlikely(atomic_read(&nbcon_cpu_emergency_cnt)))
> +               return false;
> +
>         cookie = console_srcu_read_lock();
>
>         flags = console_srcu_read_flags(con);
> @@ -1219,6 +1232,13 @@ static int nbcon_kthread_func(void *__console)
>                 if (kthread_should_stop())
>                         return 0;
>
> +               /*
> +                * Block the kthread when the system is in an emergency or panic
> +                * mode. See nbcon_kthread_should_wakeup() for more details.
> +                */
> +               if (unlikely(atomic_read(&nbcon_cpu_emergency_cnt)))
> +                       goto wait_for_event;
> +
>                 backlog = false;
>
>                 /*
> @@ -1660,6 +1680,8 @@ void nbcon_cpu_emergency_enter(void)
>
>         preempt_disable();
>
> +       atomic_inc(&nbcon_cpu_emergency_cnt);
> +
>         cpu_emergency_nesting = nbcon_get_cpu_emergency_nesting();
>         (*cpu_emergency_nesting)++;
>  }
> @@ -1674,10 +1696,18 @@ void nbcon_cpu_emergency_exit(void)
>         unsigned int *cpu_emergency_nesting;
>
>         cpu_emergency_nesting = nbcon_get_cpu_emergency_nesting();
> -
>         if (!WARN_ON_ONCE(*cpu_emergency_nesting == 0))
>                 (*cpu_emergency_nesting)--;
>
> +       /*
> +        * Wake up kthreads because there might be some pending messages
> +        * added by other CPUs with normal priority since the last flush
> +        * in the emergency context.
> +        */
> +       if (!WARN_ON_ONCE(atomic_read(&nbcon_cpu_emergency_cnt) == 0))
> +               if (atomic_dec_return(&nbcon_cpu_emergency_cnt) == 0)
> +                       nbcon_kthreads_wake();
> +
>         preempt_enable();
>  }
>
> --
> 2.51.0
>

Reviewed-by: Andrew Murray <amurray@...goodpenguin.co.uk>

Thanks,

Andrew Murray