linux-kernel - Re: [PATCH printk v4 17/27] printk: nbcon: Use nbcon consoles in console_flush

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZhgCgBK7JdRruvkj@localhost.localdomain>
Date: Thu, 11 Apr 2024 17:32:16 +0200
From: Petr Mladek <pmladek@...e.com>
To: John Ogness <john.ogness@...utronix.de>
Cc: Sergey Senozhatsky <senozhatsky@...omium.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH printk v4 17/27] printk: nbcon: Use nbcon consoles in
 console_flush_all()

On Thu 2024-04-11 16:14:58, Petr Mladek wrote:
> On Wed 2024-04-03 00:17:19, John Ogness wrote:
> > Allow nbcon consoles to print messages in the legacy printk()
> > caller context (printing via unlock) by integrating them into
> > console_flush_all(). The write_atomic() callback is used for
> > printing.
> 
> Hmm, this patch tries to flush nbcon console even in context
> with NBCON_PRIO_NORMAL. Do we really want this, please?
> 
> I would expect that it would do so only when the kthread
> is not working.
> 
> > Provide nbcon_legacy_emit_next_record(), which acts as the
> > nbcon variant of console_emit_next_record(). Call this variant
> > within console_flush_all() for nbcon consoles. Since nbcon
> > consoles use their own @nbcon_seq variable to track the next
> > record to print, this also must be appropriately handled.
> 
> I have been a bit confused by all the boolean return values
> and what _exactly_ they mean. IMHO, we should make it more
> clear how it works when it can't acquire the context.
> 
> IMHO, it is is importnat because console_flush_all() interprets
> nbcon_legacy_emit_next_record() return value as @progress even when
> there is no guaranteed progress. We just expect that
> the other context is doing something.
>
> It feels like it might get stuck forewer in some situatuon.
> It would be good to understand if it is OK or not.
> 
> 
> Later update:
> 
> Hmm, console_flush_all() is called from console_unlock().
> It might be called in atomic context. But the current
> owner might be theoretically scheduled out.
> 
> This is from documentation of nbcon_context_try_acquire()
> 
> /**
>  * nbcon_context_try_acquire - Try to acquire nbcon console
>  * @ctxt:	The context of the caller
>  *
>  * Context:	Any context which could not be migrated to another CPU.
> 
> 
> I can't find any situation where nbcon_context_try_acquire() is
> currently called in normal (schedulable) context. This is probably
> why you did not see any problems with testing.
> 
> I see 3 possible solutions:
> 
>   1. Enforce that nbcon context can be acquired only with preemtion
>      disabled.
> 
>   2. Enforce that nbcon context can be acquired only with
>      interrupts. It would prevent deadlock when some future
>      code interrupt flush in NBCON_PRIO_EMERGENCY context.
>      And then a potential nested console_flush_all() won't be
>      able to takeover the interrupted NBCON_PRIO_CONTEXT
>      and there will be no progress.
> 
>   3. console_flush_all() should ignore nbcon console when
>      it is not able to get the context, aka no progress.
> 
> 
> I personally prefer the 3rd solution because I have spent
> last 12 years on attempts to move printk into preemtible
> context. And it looks wrong to move into atomic context.
> 
> Warning: console_flush_all() suddenly won't guarantee flushing
> 	 all messages.
> 
> 	 I am not completely sure about all the consequences until
> 	 I see the rest of the patchset and the kthread intergration.
> 	 We will somehow need to guarantee that all messages
> 	 are flushed.

I am trying to make a full picture when and how the nbcon consoles
will get flushed. My current understanding and view is the following,
starting from the easiest priority:


  1. NBCON_PRIO_PANIC messages will be flushed by calling
     nbcon_atomic_flush_pending() directly in vprintk_emit()

     This will take care of any previously added messages.

     Non-panic CPUs are not allowed to add messages anymore
     when there is a panic in progress.

     [ALL OK]


  2. NBCON_PRIO_EMERGENCY messages will be flushed by calling
     nbcon_atomic_flush_pending() directly in nbcon_cpu_emergency_exit().

     This would cover all previously added messages, including
     the ones printed by the code between
     nbcon_cpu_emergency_enter()/exit().

     This won't cover later added messages which might be
     a problem. Let's look at this closer. Later added
     messages with:

	+ NBCON_PRIO_PANIC will be handled in vprintk_emit()
	  as explained above [OK]

	+ NBCON_PRIO_EMERGENCY() will be handled in the
	  related nbcon_cpu_emergency_exit() as described here.
	  [OK]

	+ NBCON_PRIO_NORMAL will be handled, see below. [?]

     [ PROBLEM: later added NBCON_PRIO_NORMAL messages, see below. ]


  3. NBCON_PRIO_NORMAL messages will be flushed by:

       + the printk kthread when it is available

       + the legacy loop via

	 + console_unlock()
	    + console_flush_all()
	      + console nbcon_legacy_emit_next_record() [PROBLEM]


PROBLEM: console_flush_all() does not guarantee progress with
	 nbcon consoles as explained above (previous mail).


My proposal:

	1. console_flush_all() will flush nbcon consoles only
	   in NBCON_PRIO_NORMAL and when the kthreads are not
	   available.

	   It will make it clear that this is the flusher in
	   this situation.


	2. Allow to skip nbcon consoles in console_flush_all() when
	   it can't take the context (as suggested in my previous
	   reply).

	   This won't guarantee flushing NORMAL messages added
	   while nbcon_cpu_emergency_exit() calls
	   nbcon_atomic_flush_pending().

	   Solve this problem by introducing[*] nbcon_atomic_flush_all()
	   which would flush even newly added messages and
	   call this in nbcon_cpu_emergency_exit() when the printk
	   kthread does not work. It should bail out when there
	   is a panic in progress.

	   Motivation: It does not matter which "atomic" context
		flushes NORMAL/EMERGENCY messages when
		the printk kthread is not available.

	  [*] Alternatively we could modify nbcon_atomic_flush_pending()
	      to flush even newly added messages when the kthread is
	      not working. But it might create another mess.

How does it sound, please?
Or do I miss anything?

Best Regards,
Petr