linux-kernel - Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Fri, 15 Dec 2017 15:52:05 +0900
From:   Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     Tejun Heo <tj@...nel.org>,
        Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        Petr Mladek <pmladek@...e.com>, Jan Kara <jack@...e.cz>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Rafael Wysocki <rjw@...ysocki.net>,
        Pavel Machek <pavel@....cz>,
        Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread

On (12/15/17 14:06), Sergey Senozhatsky wrote:
[..]
> > Where do we do the above? And has this been proven to be an issue?
> 
> um... hundreds of cases.
> 
> deep-stack spin_lock_irqsave() lockup reports from multiple CPUs (3 cpus)
> happening at the same moment + NMI backtraces from all the CPUs (more
> than 3 cpus) that follows the lockups, over not-so-fast serial console.
> exactly the bug report I received two days ago. so which one of the CPUs
> here is a good candidate to successfully emit all of the pending logbuf
> entries? none. all of them either have local IRQs disabled, or dump_stack()
> from either backtrace IPI or backtrace NMI (depending on the configuration).

and, Steven, one more thing. wondering what's your opinion.

suppose we have consoe_owner hand off enabled, 1 non-atomic CPU doing
printk-s and several atomic CPUs doing printk-s. Is proposed hand off
scheme really useful in this case? CPUs will now

a) print their lines (a potentially slow call_console_drivers())

and

b) spin in vprintk_emit on console_owner with local IRQs disabled
   waiting for either non-atomic printk CPU or another atomic CPU
   to finish printing its line (call_console_drivers()) and to hand
   off printing. so current CPU, after busy-waiting for foreign CPU's
   call_console_drivers(), will go and do his own call_console_drivers().
   which, time-wise, simply doubles (roughly) the amount of time that
   CPU spends in printk()->console_unlock(). agreed?

   if we previously could have a case when non-atomic printk CPU would
   grab the console_sem and print all atomic printk CPUs messages first,
   and then its own messages, thus atomic printk CPUs would have just
   log_store(), now we will have CPUs to call_console_driver() and to
   spin on console_sem owner waiting for call_console_driver() on a foreign
   CPU  [not all of them: it's one CPU doing the print out and one CPU
   spinning console_owner. but overall I think all CPUs will experience
   that spin on console_sem waiting for call_console_driver() and then do
   its own call_console_driver()].

even two CPUs case is not so simple anymore. see below.

- first, assume one CPU is atomic and one is non-atomic.
- second, assume that both CPUs are atomic CPUs, and go thought it again.

CPU0                            CPU1

printk()                        printk()
 log_store()
                                 log_store()
 console_unlock()
  set console_owner
                                 sees console_owner
                                 sets console_waiter
                                 spin
  call_console_drivers()
  sees console_waiter
   break

printk()
 log_store()
                                 console_unlock()
                                  set console_owner
 sees console_owner
 sets console_waiter
 spin
                                 call_console_drivers()
                                 sees console_waiter
                                  break

                                printk()
                                 log_store()
 console_unlock()
  set console_owner
                                 sees console_owner
                                 sets console_waiter
                                 spin
  call_console_drivers()
  sees console_waiter
  break

printk()
 log_store()
                                 console_unlock()
                                  set console_owner
 sees console_owner
 sets console_waiter
 spin

....

that "wait for call_console_drivers() on another CPU and then do
its own call_console_drivers()" pattern does look dangerous. the
benefit of hand-off is really fragile sometimes, isn't it?

	-ss