linux-kernel - Re: [PATCH printk v1 11/13] printk: reimplement console

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87wnh14wp9.fsf@jogness.linutronix.de>
Date:   Thu, 10 Mar 2022 17:14:18 +0106
From:   John Ogness <john.ogness@...utronix.de>
To:     Petr Mladek <pmladek@...e.com>
Cc:     Sergey Senozhatsky <senozhatsky@...omium.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        linux-kernel@...r.kernel.org,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Subject: Re: [PATCH printk v1 11/13] printk: reimplement console_lock for
 proper kthread support

On 2022-03-10, Petr Mladek <pmladek@...e.com> wrote:
>    console_unlock()
>    {
> 	  [...]
> 	  if (may_schedule)
> 	      retry = console_trylock_sched();
> 	  else
> 	      retry = console_trylock();
>    }

I believe the main confusion comes from the function name I chose and
the poor function description. Using your above code idea and changing
to a more fitting name, I would suggest:

    console_unlock()
    {
 	  [...]
 	  if (may_schedule)
 	      retry = console_lock_reacquire();
 	  else
 	      retry = console_trylock();
    }

This console_lock_reacquire() acquires the console lock the same way
that console_lock() does it. The only reason we don't just use
console_lock() is because we want to perform a try on @console_sem. But
if we are successful, in the end, we have re-taken the console lock
exactly as console_lock() did before: @console_sem locked, kthreads
blocked by mutex.

You say this creates deadlock potential, but I do not see how that could
be. We are in the same context and locking the same way we did before.

But my primary concern is not the naming or workarounds or confusing
APIs. So we should not let ourselves be diverted by that aspect.

My primary concern is the technical difference when a schedulable
context reacquires via atomic counter (which fails if any kthread is
active) vs. reacquiring via mutex (which never fails).

The reason for the reacquire is because (during direct printing) we see
that a new record appeared and we need to make sure it gets printed
(because other direct printers may have aborted, expecting us to print
it).

This scenario is only interesting if kthread printers exist because
otherwise @console_sem is enough to handle the direct printing.

So the questions are:

1. Is it OK to assume the new record will be printed if any kthread is
active? If yes, then it is enough to use the atomic counter.

2. Or, since we are responsible for direct printing, do we want to be
certain that the record is printed by printing it ourselves? If yes,
then we must block all the kthreads and perform the printing directly to
all the consoles. This requires the mutex approach.

IMHO #1 will relies heavily on kthreads waking up and printing (even
though the printk caller requested direct printing), whereas #2 will
cause direct printers to more actively print (possibly printing more
than was requested).

I prefer to go the route of #2 because it is more conservative. IMHO,
when direct printing becomes active, we really should make a strong
effort to direct print.

Having now stated the issues (hopefully clearly), I will not fight for
#2. If you say it is enough to rely on the kthreads in this scenario,
then I will implement the atomic counter solution for my v2.

John