linux-kernel - Re: [PATCH printk v2 08/18] printk: nbcon: Stop threads on shutdown/reboot

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZnwGOaSLTkIB0csA@pathway.suse.cz>
Date: Wed, 26 Jun 2024 14:14:49 +0200
From: Petr Mladek <pmladek@...e.com>
To: John Ogness <john.ogness@...utronix.de>
Cc: Sergey Senozhatsky <senozhatsky@...omium.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH printk v2 08/18] printk: nbcon: Stop threads on
 shutdown/reboot

On Tue 2024-06-25 22:02:30, John Ogness wrote:
> On 2024-06-17, Petr Mladek <pmladek@...e.com> wrote:
> > This makes me wonder whether we need to stop the kthreads at all.
> 
> > How exactly does this make printk more robust during shutdown?
> 
> Because by stopping them, the printing threads are guaranteed to go
> silent before they suddenly freeze. Without the clean shutdown, I could
> create shutdown/reboot scenarios where the printing thread would stop
> mid-line. Then the atomic printing would never be able to finish because
> it is a non-panic situation and the thread was frozen while holding the
> context. The result: the user never sees the final printk messages.

I see.

> > Well, there is one differece. The kthreads could not interferre with
> > the direct flush when they are stopped.
> >
> > But we have the same problem also with suspend/resume. And we do not
> > stop the kthreads in this case.
> 
> Suspend/resume is something different. Suspend needs to stop _all_
> printing. I do not think it makes sense to shutdown threads for
> this. The consoles becoming temporarily !usable() is enough.

Just to be sure that we are on the same page. I made a closer
look at the suspend code. I see 4 important milestones:

  1. suspend_console();

     The consoles are intentionally stopped after this point.
     It allows to suspend the related hardware, for example,
     the serial port.


  2. pm_sleep_disable_secondary_cpus();

     Other CPUs are stopped at this point. The kthread might still
     be theoretically scheduled.


  3. local_irq_disable();

     IRQs get disabled. Neither the hardware nor the kthread could
     work after this point.


  4. system_state = SYSTEM_SUSPEND;

     The global value is set. vprintk_emit() starts calling
     nbcon_atomic_flush_pending().


>From some reasons, I thought that ordering was different and
the "system_state" was set much earlier. This is why I thought
that it would make sense to "stop" the kthreads.

But it seems that, in reality, the nbcon_atomic_flush_pending() in
vprintk_emit() would never do anything during suspend. The consoles
are already suspended (unusable) when the system_state gets set
to SYSTEM_SUSPEND.

It means that we do not need to synchronize the printk kthreads with
vprintk_emit() for suspend. The only important thing is
the pr_flush() in suspend_console().


> For shutdown, the consoles are usable() the entire time. I just want the
> threads to get out of the way before they freeze so that the user can
> see all the messages on shutdown/reboot.

I see what you mean now. The consoles are not suspended => they are
usable all the time.


> > Maybe, we should just add a check of
> >
> >        system_state > SYSTEM_RUNNING
> >
> > also into the nbcon_kthread_func(). Maybe, the kthread should not try
> > to flush the messages when they are flushed directly by
> > vprintk_emit().
> 
> It gets racy when we start relying on the contexts noticing some
> variables. By racy, I mean there are scenarios where there are unprinted
> records and no context is printing. I think it is easiest when the
> following steps occur within a notifier:
> 
> 1. notifier sets a flag to allow atomic printing from vprintk_emit()
> 
> 2. notifier stops all threads (with the kthread_should_stop() moved
>    inside the printing loop of nbcon_kthread_func())
> 
> 3. notifier performs nbcon_atomic_flush_pending()
> 
> This guarantees that no messages are lost and that all get flushed.

Makes sense.

Best Regards,
Petr