lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZhfwXsEE2Y8IPPxX@localhost.localdomain>
Date: Thu, 11 Apr 2024 16:14:54 +0200
From: Petr Mladek <pmladek@...e.com>
To: John Ogness <john.ogness@...utronix.de>
Cc: Sergey Senozhatsky <senozhatsky@...omium.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH printk v4 17/27] printk: nbcon: Use nbcon consoles in
 console_flush_all()

On Wed 2024-04-03 00:17:19, John Ogness wrote:
> Allow nbcon consoles to print messages in the legacy printk()
> caller context (printing via unlock) by integrating them into
> console_flush_all(). The write_atomic() callback is used for
> printing.

Hmm, this patch tries to flush nbcon console even in context
with NBCON_PRIO_NORMAL. Do we really want this, please?

I would expect that it would do so only when the kthread
is not working.

> Provide nbcon_legacy_emit_next_record(), which acts as the
> nbcon variant of console_emit_next_record(). Call this variant
> within console_flush_all() for nbcon consoles. Since nbcon
> consoles use their own @nbcon_seq variable to track the next
> record to print, this also must be appropriately handled.

I have been a bit confused by all the boolean return values
and what _exactly_ they mean. IMHO, we should make it more
clear how it works when it can't acquire the context.

IMHO, it is is importnat because console_flush_all() interprets
nbcon_legacy_emit_next_record() return value as @progress even when
there is no guaranteed progress. We just expect that
the other context is doing something.

It feels like it might get stuck forewer in some situatuon.
It would be good to understand if it is OK or not.


Later update:

Hmm, console_flush_all() is called from console_unlock().
It might be called in atomic context. But the current
owner might be theoretically scheduled out.

This is from documentation of nbcon_context_try_acquire()

/**
 * nbcon_context_try_acquire - Try to acquire nbcon console
 * @ctxt:	The context of the caller
 *
 * Context:	Any context which could not be migrated to another CPU.


I can't find any situation where nbcon_context_try_acquire() is
currently called in normal (schedulable) context. This is probably
why you did not see any problems with testing.

I see 3 possible solutions:

  1. Enforce that nbcon context can be acquired only with preemtion
     disabled.

  2. Enforce that nbcon context can be acquired only with
     interrupts. It would prevent deadlock when some future
     code interrupt flush in NBCON_PRIO_EMERGENCY context.
     And then a potential nested console_flush_all() won't be
     able to takeover the interrupted NBCON_PRIO_CONTEXT
     and there will be no progress.

  3. console_flush_all() should ignore nbcon console when
     it is not able to get the context, aka no progress.


I personally prefer the 3rd solution because I have spent
last 12 years on attempts to move printk into preemtible
context. And it looks wrong to move into atomic context.

Warning: console_flush_all() suddenly won't guarantee flushing
	 all messages.

	 I am not completely sure about all the consequences until
	 I see the rest of the patchset and the kthread intergration.
	 We will somehow need to guarantee that all messages
	 are flushed.


I propose the following changes to make console_flush_all()
safe. The patch is made on top of this patchset. Feel free
to squash them into your patch and remove my SoB.


>From 8dd9c0be5ab693c6976a0f7f8b99de48ecebcbf6 Mon Sep 17 00:00:00 2001
From: Petr Mladek <pmladek@...e.com>
Date: Thu, 11 Apr 2024 15:45:53 +0200
Subject: [PATCH] printk: nbcon: Prevent deadlock when flushing nbcon consoles
 in the legacy loop

Ignore pending records in nbcon consoles in the legacy console_flush_all()
loop when the nbcon console context can't be acquired. They will be
flushed either by the current nbcon context owner or to-be-introduced
printk kthread.

Signed-off-by: Petr Mladek <pmladek@...e.com>
---
 kernel/printk/nbcon.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index c57ad34a8d56..88c40f165be4 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -964,8 +964,12 @@ static __ref unsigned int *nbcon_get_cpu_emergency_nesting(void)
  *				write_atomic() callback
  * @wctxt:	An initialized write context struct to use for this context
  *
- * Return:	False if it is known there are no more records to print,
- *		otherwise true.
+ * Return:	True, when a record has been printed and there are still
+ *		pending records. The caller might want to continue flushing.
+ *
+ *		False, when there is no pending record, or when the console
+ *		context can't be acquired, or the ownership has been lost.
+ *		The caller should give up. Either the job is or	can't be done.
  *
  * This is an internal helper to handle the locking of the console before
  * calling nbcon_emit_next_record().
@@ -975,7 +979,7 @@ static bool nbcon_atomic_emit_one(struct nbcon_write_context *wctxt)
 	struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
 
 	if (!nbcon_context_try_acquire(ctxt))
-		return true;
+		return false;
 
 	/*
 	 * nbcon_emit_next_record() returns false when the console was
@@ -983,7 +987,7 @@ static bool nbcon_atomic_emit_one(struct nbcon_write_context *wctxt)
 	 * longer valid.
 	 */
 	if (!nbcon_emit_next_record(wctxt))
-		return true;
+		return false;
 
 	nbcon_context_release(ctxt);
 
@@ -1023,8 +1027,13 @@ enum nbcon_prio nbcon_get_default_prio(void)
  * @cookie:	The cookie from the SRCU read lock.
  *
  * Context:	Any context except NMI.
- * Return:	False if the given console has no next record to print,
- *		otherwise true.
+ *
+ * Return:	True, when a record has been printed and there are still
+ *		pending records. The caller might want to continue flushing.
+ *
+ *		False, when there is no pending record, or when the console
+ *		context can't be acquired, or the ownership has been lost.
+ *		The caller should give up. Either the job is or	can't be done.
  *
  * This function is meant to be called by console_flush_all() to print records
  * on nbcon consoles from legacy context (printing via console unlocking).
-- 
2.44.0


Best Regards,
Petr

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ