linux-kernel - Re: [PATCH printk v2 04/18] printk: nbcon: Introduce printing kthreads

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZmMIOWEAMgXz4j-U@pathway.suse.cz>
Date: Fri, 7 Jun 2024 15:17:06 +0200
From: Petr Mladek <pmladek@...e.com>
To: John Ogness <john.ogness@...utronix.de>
Cc: Sergey Senozhatsky <senozhatsky@...omium.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Thomas Gleixner <tglx@...utronix.de>, linux-kernel@...r.kernel.org,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Subject: Re: [PATCH printk v2 04/18] printk: nbcon: Introduce printing
 kthreads

On Tue 2024-06-04 01:30:39, John Ogness wrote:
> From: Thomas Gleixner <tglx@...utronix.de>
> 
> Provide the main implementation for running a printer kthread
> per nbcon console that is takeover/handover aware.

The changes done by this patch deserve a more detailed commit message ;-)

I updated this commit message in parallel with commenting the related
code changes. My later comment might explain the motivation for
writing the commit message this way.

<proposal>
Provide the main implementation for running a printer kthread per
nbcon console that is takeover/handover aware.

Once the kthread is running and available, it gets responsible for
flushing any pending messages which were added in NBCON_PRIO_NORMAL
context. Namely the legacy console_flush_all() and device_release()
do not longer flush the console. And nbcon_atomic_flush_pending()
used by nbcon_cpu_emergency_exit() does not longer flush messages
added after the emergency ones.

The kthread uses a new write_thread() callback with both device_lock()
and acquired the console context.

The console ownership handling is necessary for synchronization against
write_atomic() which is synchronized only via the console context
ownership.

The device_lock() serializes acquiring the console context with
NBCON_PRIO_NORMAL. It is needed when the device_lock() does not
disable preemption. It prevents the following race:

CPU0				CPU1

 [ task A ]

 nbcon_context_try_acquire()
   # success with NORMAL prio
   # .unsafe == false;  // safe for takeover

 [ schedule: task A -> B ]


				WARN_ON()
				  nbcon_atomic_flush_pending()
				    nbcon_context_try_acquire()
				      # success with EMERGENCY prio

				      # flushing
				      nbcon_context_release()

				      # HERE: con->nbcon_state is free
				      #       to take by anyone !!!


 nbcon_context_try_acquire()
   # success with NORMAL prio [ task B ]

 [ schedule: task B -> A ]

 nbcon_enter_unsafe()
   nbcon_context_can_proceed()

BUG: nbcon_context_can_proceed() returns "true" because
     the console is owned by a context on CPU0 with
     NBCON_PRIO_NORMAL.

     But it should return "false". The console is owned
     by a context from task B and we do the check
     in a context from task A.

The write context in the kthread will stay safe only when either of
the following conditions are true:

  1. The kthread is the only context which acquires the console
     with NBCON_PRIO_NORMAL.

  2. Other caller acquires the console context with NBCON_PRIO_NORMAL
     under the device_lock.

  3. Other caller acquires the console context with NBCON_PRIO_NORMAL
     with disabled preemption. It will release the context before
     rescheduling.

It is even double guaranteed. First, __nbcon_atomic_flush_pending_con()
is called:

  + with disabled interrupts from nbcon_atomic_flush_pending_con()
  + under device_lock() from nbcon_device_release()

Second, they are not longer called with NBCON_PRIO_NORMAL when
con->kthread exists.

Also note that the kthreads are running with nice -20 so that they are
scheduled shortly after a printk call and with generous runtime
budgets.

This patch provides just the basic functionality. The followup changes
will add the necessary integration, for example, creating the kthreads
at the right time or waking them when a new message appear.
</proposal>

BTW: It really looks like the safety is double guaranteed. Maybe
     the con->device_lock() is not needed in nbcon_kthread_func()
     after all. Well, I would keep it to be on the safe side.

> --- a/include/linux/console.h
> +++ b/include/linux/console.h
> @@ -373,6 +376,27 @@ struct console {
>  	 */
>  	void (*write_atomic)(struct console *con, struct nbcon_write_context *wctxt);
>  
> +	/**
> +	 * @write_thread:
> +	 *
> +	 * NBCON callback to write out text in task context.
> +	 *
> +	 * This callback is called after device_lock() and with the nbcon
> +	 * console acquired. Any necessary driver synchronization should have
> +	 * been performed by the device_lock() callback.
> +	 *
> +	 * This callback is always called from task context but with migration
> +	 * disabled.
> +	 *
> +	 * The same criteria for console ownership verification and unsafe
> +	 * sections applies as with write_atomic(). The difference between
> +	 * this callback and write_atomic() is that this callback is used
> +	 * during normal operation and is always called from task context.
> +	 * This allows drivers to operate in their own locking context for
> +	 * synchronizing output to the hardware.
> +	 */

The description is not bad. It seems to include all the important
information. Except than I still needed to scratch my head around it
to understand the real purpose and rules.

Namely, the following sentences are kind of vague and bring questions
which I had in the past before I understood all the relations:

   1. "Any necessary driver synchronization should have been performed
       by the device_lock() callback."

       Q: Why do we need to take both device_lock() and the context then?
       Q: Why the acquired context is not enough?

   2. "This allows drivers to operate in their own locking context for
       synchronizing output to the hardware."

       Q: What exactly does this sentence mean?
       Q: What is the driver?
       Q: What are the output callbacks?
       Q: How exactly is this related to write_atomic() and write_thread()?
       Q: Is the context more strict or relaxed, in which way?

Also I still keep in my mind that nbcon_context_try_acquire() is not
safe with NBCON_NORMAL_PRIO with enabled preemption. There is
the race discussed in the previous patchset, see
https://lore.kernel.org/r/ZiD3FNBZh_iMOVWY@pathway.suse.cz

I wonder if we could be more strigthforward.

<my-take>
	/**
	 * @write_thread:
	 *
	 * NBCON callback to write out text in task context.
	 *
	 * This callback must be called only in task context with both
	 * device_lock() and the nbcon console acquired.
	 *
	 * The same rules for console ownership verification and unsafe
	 * sections handling applies as with write_atomic().
	 *
	 * The console ownership handling is necessary for synchronization
	 * against write_atomic() which is synchronized only via the context.
	 *
	 * The device_lock() serializes acquiring the console context
	 * with NBCON_PRIO_NORMAL. It is necessary when the device_lock()
	 * does not disable preemption. The console context locking is
	 * not able to detect a race in the following scenario:
	 *
	 *    1. [Task A] owns a context with NBCON_PRIO_NORMAL on [CPU X]
	 *	 and is scheduled.
	 *
	 *    2. Another context takes over the lock with NBCON_PRIO_EMERGENCY
	 *	 and releases it.
	 *
	 *    3. [Task B] acquires a context with NBCON_PRIO_NORMAL on [CPU X]
	 *	 and is scheduled.
	 *
	 *    4. [Task A] gets running on [CPU X] and see that the console is
	 *	 is still owned by a task on [CPU X] with NBON_PRIO_NORMAL.
	 *	 It can't detect that it is actually owned by another task.
	 */
</my-take>

My variant describes the purpose of device_lock() quite different way.
But I think that this is the real purpose and we must be clear about
it.

Sigh, I was not able to describe the race reasonably a shorter
way. Maybe, we should move this detailed explanation above,
nbcon_context_try_acquire() and use just a reference here.

> +	void (*write_thread)(struct console *con, struct nbcon_write_context *wctxt);
> +
>  	/**
>  	 * @device_lock:
>  	 *
> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -936,6 +944,120 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt)
>  	return nbcon_context_exit_unsafe(ctxt);
>  }
>  
> +/**
> + * nbcon_kthread_should_wakeup - Check whether a printer thread should wakeup
> + * @con:	Console to operate on
> + * @ctxt:	The nbcon context from nbcon_context_try_acquire()
> + *
> + * Return:	True if the thread should shutdown or if the console is
> + *		allowed to print and a record is available. False otherwise.
> + *
> + * After the thread wakes up, it must first check if it should shutdown before
> + * attempting any printing.
> + */
> +static bool nbcon_kthread_should_wakeup(struct console *con, struct nbcon_context *ctxt)
> +{
> +	bool ret = false;
> +	short flags;
> +	int cookie;
> +
> +	if (kthread_should_stop())
> +		return true;
> +
> +	cookie = console_srcu_read_lock();
> +
> +	flags = console_srcu_read_flags(con);
> +	if (console_is_usable(con, flags)) {
> +		/* Bring the sequence in @ctxt up to date */
> +		ctxt->seq = nbcon_seq_read(con);
> +
> +		ret = prb_read_valid(prb, ctxt->seq, NULL);
> +	}
> +
> +	console_srcu_read_unlock(cookie);
> +	return ret;
> +}
> +
> +/**
> + * nbcon_kthread_func - The printer thread function
> + * @__console:	Console to operate on
> + *
> + * Return:	0
> + */
> +static int nbcon_kthread_func(void *__console)
> +{
> +	struct console *con = __console;
> +	struct nbcon_write_context wctxt = {
> +		.ctxt.console	= con,
> +		.ctxt.prio	= NBCON_PRIO_NORMAL,
> +	};
> +	struct nbcon_context *ctxt = &ACCESS_PRIVATE(&wctxt, ctxt);
> +	short con_flags;
> +	bool backlog;
> +	int cookie;
> +	int ret;
> +
> +wait_for_event:
> +	/*
> +	 * Guarantee this task is visible on the rcuwait before
> +	 * checking the wake condition.
> +	 *
> +	 * The full memory barrier within set_current_state() of
> +	 * ___rcuwait_wait_event() pairs with the full memory
> +	 * barrier within rcuwait_has_sleeper().
> +	 *
> +	 * This pairs with rcuwait_has_sleeper:A and nbcon_kthread_wake:A.
> +	 */
> +	ret = rcuwait_wait_event(&con->rcuwait,
> +				 nbcon_kthread_should_wakeup(con, ctxt),
> +				 TASK_INTERRUPTIBLE); /* LMM(nbcon_kthread_func:A) */
> +
> +	if (kthread_should_stop())
> +		return 0;
> +
> +	/* Wait was interrupted by a spurious signal, go back to sleep. */
> +	if (ret)
> +		goto wait_for_event;
> +
> +	do {
> +		backlog = false;
> +

I am not sure how obvious is that we take the lock around the entire
operation. In principle, the console could not disappear. So it might
look like it is not really necessary. But it actually plays important
role when stopping/suspending the console. I would add a comment:

		/*
		 * Keep the read lock around the entire operation so that
		 * synchronize_srcu() could prove that the kthread stopped
		 * or suspended printing.
		 */

> +		cookie = console_srcu_read_lock();
> +
> +		con_flags = console_srcu_read_flags(con);
> +
> +		if (console_is_usable(con, con_flags)) {
> +			unsigned long lock_flags;
> +
> +			con->device_lock(con, &lock_flags);
> +
> +			/*
> +			 * Ensure this stays on the CPU to make handover and
> +			 * takeover possible.
> +			 */
> +			cant_migrate();
> +
> +			if (nbcon_context_try_acquire(ctxt)) {
> +				/*
> +				 * If the emit fails, this context is no
> +				 * longer the owner.
> +				 */
> +				if (nbcon_emit_next_record(&wctxt, false)) {
> +					nbcon_context_release(ctxt);
> +					backlog = ctxt->backlog;
> +				}
> +			}
> +
> +			con->device_unlock(con, lock_flags);
> +		}
> +
> +		console_srcu_read_unlock(cookie);
> +
> +	} while (backlog);

Thinking loudly:

We do not check kthread_should_stop() in the cycle. It means that it
would flush all messages before stopping the kthread. But only
when it is already in the cycle. It would not flush the messages
in the following scenario:

CPU0					CPU1

printk("Unregistering console\n");
  nbcon_wake_threads();
     irq_work_queue(&con->irq_work);

kthread_stop(con->thread);

					nbcon_kthread_func()
					  rcuwait_wait_event()
					    nbcon_kthread_should_wakeup()

					  if (kthread_should_stop())
					    // true
					    return 0

Result: The kthread did not flush the pending messages in this case.

I do not have strong opinion. Well, it would make sense to flush
all messages before stopping the kthread,

Maybe, we should move kthread_should_stop() check here.

Maybe, we should move the flushing cycle into a separate function
and the kthread might look like:

	do {
		rcuwait_wait_event(&con->rcuwait,
				   nbcon_kthread_should_wakeup(con, ctxt),
				   TASK_INTERRUPTIBLE); /* LMM(nbcon_kthread_func:A) */

		nbcon_kthread_flush();
	while (!kthread_should_stop);

Note that I did not add the check for the spurious signal. It looks
just like an optimization. IMHO, it is not worth the code churn.
Also the spurious signal does not mean that there are no
pending messages. And if we want to flush everything before exiting...

Anyway, we probably should document the (desired) behavior in the function
description.

> +
> +	goto wait_for_event;
> +}
> +
>  /* Track the nbcon emergency nesting per CPU. */
>  static DEFINE_PER_CPU(unsigned int, nbcon_pcpu_emergency_nesting);
>  static unsigned int early_nbcon_pcpu_emergency_nesting __initdata;
> @@ -1332,6 +1454,63 @@ void nbcon_cpu_emergency_flush(void)
>  	}
>  }
>  
> +/*
> + * nbcon_kthread_stop - Stop a printer thread
> + * @con:	Console to operate on
> + */
> +static void nbcon_kthread_stop(struct console *con)
> +{
> +	lockdep_assert_console_list_lock_held();
> +
> +	if (!con->kthread)
> +		return;
> +
> +	kthread_stop(con->kthread);
> +	con->kthread = NULL;
> +}
> +
> +/**
> + * nbcon_kthread_create - Create a printer thread
> + * @con:	Console to operate on
> + *
> + * If it fails, let the console proceed. The atomic part might
> + * be usable and useful.
> + */
> +void nbcon_kthread_create(struct console *con)
> +{
> +	struct task_struct *kt;
> +
> +	lockdep_assert_console_list_lock_held();
> +
> +	if (!(con->flags & CON_NBCON) || !con->write_thread)
> +		return;
> +
> +	if (con->kthread)
> +		return;
> +
> +	/*
> +	 * Printer threads cannot be started as long as any boot console is
> +	 * registered because there is no way to synchronize the hardware
> +	 * registers between boot console code and regular console code.
> +	 */
> +	if (have_boot_console)
> +		return;
> +
> +	kt = kthread_run(nbcon_kthread_func, con, "pr/%s%d", con->name, con->index);
> +	if (IS_ERR(kt)) {
> +		con_printk(KERN_ERR, con, "failed to start printing thread\n");
> +		return;

I am a bit surprised that we ignore the error here.

Hmm, it likely works in this patch because the legacy code would still
flush the console when con->thread is not assigned.

But it would stop working later when the legacy loop is disabled
by the global @printk_threads_enabled variable.


> +	}
> +

Thinking loudly:

The kthread is running and started processing messages at this moment.
But con->kthraed is not set yet.

All this is done under @console_list_lock. It only guarantees that
the struct console would not disappear while some code access it under
console_srcu_read_lock().

The value is used in console_flush_all() and in nbcon_device_release()
to decide whether to flush the console directly or not.

I first though that console_flush_all() and nbcon_device_release()
could flush the messages using write_atomic() in parallel with
the kthread and it might create the race with NBCON_PRIO_NORMAL
context ownership.

But both calls will ignore this console until it is added into
@console_list which is done later under con->device_lock().

Not to say that:

  + console_flush_all() acquires the nbcon context with interrupts
    disabled => race prevented

  + nbcon_device_release() acquires the nbcon context under device
    lock => serialized with the kthread

So we are on the safe side (double guaranteed). Well, it took me quite
some time to realize this. It might be worth a comment.

<proposal>
	/*
	 * Some users check con->kthread to decide whether to flush
	 * the messages directly using con->write_atomic(). Note that
	 * possible races with the kthread are double prevented.
	 *
	 * First, the users ignore this console until it is added into
	 * @console_list which is done under con->device_lock().
	 * Second, the calls would be additionaly serialized by acquiring
	 * the console context.
	 */
</proposal>
> +	con->kthread = kt;
> +
> +	/*
> +	 * It is important that console printing threads are scheduled
> +	 * shortly after a printk call and with generous runtime budgets.
> +	 */
> +	sched_set_normal(con->kthread, -20);

I would prefer to move this into a separate patch. I feel that it
might be kind of controversial and subject to a change in the future.
We should not hide this "tiny" detail in this mega patch ;-)

That said, the low "nice" value makes perfect sense to me.
And I makes me feel comfortable.

> +}
> +
>  /**
>   * nbcon_alloc - Allocate buffers needed by the nbcon console
>   * @con:	Console to allocate buffers for
> @@ -1458,6 +1639,7 @@ void nbcon_device_release(struct console *con)
>  	 */
>  	cookie = console_srcu_read_lock();
>  	if (console_is_usable(con, console_srcu_read_flags(con)) &&
> +	    !con->kthread &&

This would deserve updating the comment like in nbcon_atomic_flush_pending_con().

>  	    prb_read_valid(prb, nbcon_seq_read(con), NULL)) {
>  		__nbcon_atomic_flush_pending_con(con, prb_next_reserve_seq(prb), false);
>  	}

Best Regards,
Petr