linux-kernel - Re: [GIT PULL] printk for 4.13

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Thu, 6 Jul 2017 09:44:04 +0900
From:   Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Petr Mladek <pmladek@...e.com>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [GIT PULL] printk for 4.13

Hello,

On (07/05/17 11:23), Linus Torvalds wrote:
> On Mon, Jul 3, 2017 at 7:24 AM, Petr Mladek <pmladek@...e.com> wrote:
> >
> > - Store printk() messages into the main log buffer directly even in NMI
> >   when the lock is available. It is the best effort to print even large
> >   chunk of text. It is handy, for example, when all ftrace messages
> >   are printed during the system panic in NMI.
> 
> No, this is entirely buggered.
> 
> You can't just do
> 
>    raw_spin_is_locked(&logbuf_lock)
> 
> to test whether you can get the logbuf_lock and then call vprintk_default().
> 
> It's not just about deadlock avoidance, the code will call things like
> down_trylock() on the console semaphore, and that operation is
> fundamentally not NMI-safe. The semaphore count is literally protected
> by a irq-safe (BUT NOT NMI-SAFE!) semaphore spinlock.
>
> So now you can instead deadlock just on the internal console semaphore
> spinlock (ie somebody is doing "console_lock()", NMI comes in, *BOOM*.

you are right. but

PRINTK_NMI_DEFERRED_CONTEXT_MASK calls into vprintk_deferred(), which
does vprintk_emit(0, LOGLEVEL_SCHED, NULL, 0, fmt, args). and
vprintk_emit() LOGLEVEL_SCHED is nothing, but printk_deferred(), which
never calls into the scheduler and does not even try to lock console_sem.
the only lock it takes is logbuf_lock, and the idea behind those checks
in printk_nmi_enter() is to make sure that we can safely call
vprintk_deferred() from NMI on that CPU.

printk_safe context is per-CPU, so if we see at NMI entry that this_cpu
has PRINTK_SAFE_CONTEXT_MASK set, then it means that NMI is preempting
printk-safe context on this_cpu, which is already alarming, because we
switch to printk-safe context *mostly* in places where printk recursion
can deadlock us (console_sem up/down/try_down and logbuf_lock lock/unlock).

why *mostly*... because we are also in printk-safe context during the
call_console_drivers() call.

so this is why we also check raw_spin_is_locked(&logbuf_lock). by the
time we do raw_spin_is_locked(&logbuf_lock) we already know that
this_cpu is in printk-safe, so it is _potentially_ unsafe to call
vprintk_emit(), unless we are printk-safe for call_console_drivers(),
which means that logbuf_lock is not lockde by this_cpu.

so what printk_nmi_enter() does

- this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK

  if we are NOT in printk-safe then we know that from this CPU we can
  call printk_deferred().

  otherwise, we need to make sure that it is safe to call printk_deferred().

- raw_spin_is_locked(&logbuf_lock)

  if the lock is not locked -- we can printk_deferred() from that CPU.

  otherwise, we use printk-nmi buffer and flush the messages to logbuf
  later from irq_work.

does this make sense?

	-ss