linux-kernel - Re: [PATCH v2] printk/nmi: avoid direct printk()-s from __printk_nmi

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160831014441.GA472@swordfish>
Date:   Wed, 31 Aug 2016 10:44:41 +0900
From:   Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
To:     Andrew Morton <akpm@...ux-foundation.org>
Cc:     Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        Petr Mladek <pmladek@...e.com>, Jan Kara <jack@...e.cz>,
        linux-kernel@...r.kernel.org,
        Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
Subject: Re: [PATCH v2] printk/nmi: avoid direct printk()-s from
 __printk_nmi_flush()

Hello,

On (08/30/16 15:03), Andrew Morton wrote:
> > __printk_nmi_flush() can be called from nmi_panic(), therefore it has to
> > test whether it's executed in NMI context and thus must route the messages
> > through deferred printk() or via direct printk().
> 
> Why?  What misbehaviour does the current code cause?

the reasoning behind the `if in_nmi()' in print_nmi_seq_line()

       if (in_nmi())
               printk_deferred("%.*s", (end - start) + 1, buf);
       else
               printk("%.*s", (end - start) + 1, buf);

was as follows (per Petr's commit message)

:    In NMI context, printk() messages are stored into per-CPU buffers to
:    avoid a possible deadlock.  They are normally flushed to the main ring
:    buffer via an IRQ work.  But the work is never called when the system
:    calls panic() in the very same NMI handler.
:
:    This patch tries to flush NMI buffers before the crash dump is
:    generated.  In this case it does not risk a double release and bails out
:    when the logbuf_lock is already taken.  The aim is to get the messages
:    into the main ring buffer when possible.  It makes them better
:    accessible in the vmcore.
:
:    Then the patch tries to flush the buffers second time when other CPUs
:    are down.  It might be more aggressive and reset logbuf_lock.  The aim
:    is to get the messages available for the consequent kmsg_dump() and
:    console_flush_on_panic() calls.
:
:    The patch causes vprintk_emit() to be called even in NMI context again.
:    But it is done via printk_deferred() so that the console handling is
:    skipped.  Consoles use internal locks and we could not prevent a
:    deadlock easily.  They are explicitly called later when the crash dump
:    is not generated, see console_flush_on_panic().

doing pr_err() and pr_cont() defeats the purpose of print_nmi_seq_line(),
because in the worst case it can do something like this:

  vprintk_emit()
   /* console_trylock() */
   console_unlock()
    call_console_drivers()
     foo_write()   // possibly locked

the tricky moment here is -- if the console semaphore is unlocked and
NMI can successfully console_trylock(), then how any of underlying serial
console driver's locks can be taken? one possibility is:

	CPU0

SyS_ioctl
 do_vfs_ioctl
  tty_ioctl
   n_tty_ioctl
    tty_mode_ioctl
     set_termios
      tty_set_termios
       uart_set_termios
        uart_change_speed
         FOO_serial_set_termios
          spin_lock_irqsave(&port->lock)     // lock the output port
           ...
--> NMI
       nmi_panic()
        printk_nmi_flush_on_panic()
         __printk_nmi_flush()
          pr_cont()
           /* console_trylock() */
           console_unlock()
            call_console_drivers()
             foo_write()
              spin_lock_irqsave(&port->lock)  // already locked

as far as I can see, tty_ioctl() path does not lock console semaphore,
but grabs some tty locks instead.

	-ss