lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87czisbotz.fsf@jogness.linutronix.de>
Date:   Fri, 11 Mar 2022 14:34:40 +0106
From:   John Ogness <john.ogness@...utronix.de>
To:     Petr Mladek <pmladek@...e.com>
Cc:     Sergey Senozhatsky <senozhatsky@...omium.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        linux-kernel@...r.kernel.org,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Subject: Re: [PATCH printk v1 11/13] printk: reimplement console_lock for
 proper kthread support

On 2022-03-11, Petr Mladek <pmladek@...e.com> wrote:
>>     console_unlock()
>>     {
>>  	  [...]
>>  	  if (may_schedule)
>>  	      retry = console_lock_reacquire();
>>  	  else
>>  	      retry = console_trylock();
>>     }
>> 

[...]

> OK, it means that the main problem here _is not_ the scheduling context,
> console_lock() vs. console_trylock(). The main problem _is_ the direct
> printing vs. the offload to kthreads.
>
> Of course, the context is important. It affects how we could re-take
> the lock. But the main problem is the printing mode. We must make sure
> that:
>
>     1. someone is printing pending messages when the direct mode is needed

console_trylock() causes difficulties here because it will fail if any
kthread is active. It is an example of direct mode failure. But is that
really any different than current mainline console_trylock() failing
because a console_lock() context is active (and possibly not scheduled
on a CPU)?

>     2. kthreads are woken and can enter the printing mode when the direct
>        mode is disabled.

This happens at the end of vprintk_emit() and within __console_unlock(),
regardless if the printk() was running in direct mode or not.

> Will console_lock_reacquire() really help here?
>
> The API theoretically helps in direct mode when the lock was taken
> via console_lock().

console_lock_reacquire() only exists for the console_lock() case.

> But it does not help when the lock was taken
> via console_trylock() from printk(). It might mean that
> the forward progress might not be guaranteed in the direct mode
> (early boot, panic, ...).

How is the console_trylock() case different from current mainline now?
As I mentioned above, the kthreads can block console_trylock(), but so
can a console_lock() currently in mainline.

> Hmm, the forward progress seems to be guaranteed in the direct
> mode most of the time. console_trylock() can take over
> the atomic counter because console kthreads are not allowed
> to enter the printing mode in this case.
>
> I used "most of the time" because there might be races when
> the mode is switched. "printk_direct" is an atomic variable.
> CON_DIRECT is set under con->mutex but console_trylock()
> does not take the mutex...

You are mixing issues here. If CON_DIRECT is set, there is already a
console_lock() in progress, so console_trylock() fails on @console_sem.

> There are also races when the offload to consoles kthreads
> is allowed. For example, console_trylock() might block
> console_kthread_printing_tryenter().

I do not see how that is a problem. If any context has the console lock
(no matter how it got that lock) then the kthreads are blocked.

If direct printing is enabled (from @printk_direct or @oops_in_progress
or @system_state != SYSTEM_RUNNING), the task with the console lock will
print out *all* the remaining records.

If direct printing is not enabled, the kthreads are woken in
__console_unlock().

> Sigh, I am afraid that we have non-trivial problems
> to guarantee that all messages will be printed:
>
>      + races when switching between direct mode
>        and offload to kthreads. It might cause
>        stall in both modes.

Scheduable contexts holding locks can cause stalls. We have that same
problem with console_lock() in mainline now. The kernel provides
mechanisms to avoid such stalls (niceness, priorities, policies,
priority inheritance), but this is all problem-specific and must be
fine-tuned by the user if they are running workloads that are causing
problems. kthreads are not solving the reliability problem (and they
never will).

>      + console_trylock() races with
>        console_kthread_printing_tryenter().
>        It might put kthread into a sleep even when
>        it is supposed to print the message.

kthread is never _supposed_ to print a message. It is there to offload
direct printing. If console_trylock() (direct printing) wins, then that
is the context that does the printing.

> IMHO, console_lock_reacquire() does not help much here.
> We need to solve console_trylock() path anyway.

It preserves a consistent locking scenario for the console_lock()
path. That is all it is intended to do.

> I think that the solution might be:
>
>    + make sure that the state of "printk_direct" atomic variable
>      is enough to distinguish about the mode.

The printk subsystem does not have absolute control over which
task/context is doing the actual printing. Currently in mainline there
are printers that handoff to waiters and even some that do not print at
all because there is already waiters. Adding kthreads introduces a new
task that can print. But there still is no real control about who
prints. The important thing is that there is some context or runnable
task that is running the printing code (whether scheduled or not).

>    + always wakeup() console kthreads after console_trylock()
>      to handle the possible race with
>      console_kthread_printing_tryenter()

I do not understand. If console_trylock() wins, it already wakes the
kthreads on __console_unlock(). If console_trylock() loses, the kthreads
are already running.

John

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ