[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171215065205.GB468@jagdpanzerIV>
Date: Fri, 15 Dec 2017 15:52:05 +0900
From: Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: Tejun Heo <tj@...nel.org>,
Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
Petr Mladek <pmladek@...e.com>, Jan Kara <jack@...e.cz>,
Andrew Morton <akpm@...ux-foundation.org>,
Peter Zijlstra <peterz@...radead.org>,
Rafael Wysocki <rjw@...ysocki.net>,
Pavel Machek <pavel@....cz>,
Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
linux-kernel@...r.kernel.org
Subject: Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
On (12/15/17 14:06), Sergey Senozhatsky wrote:
[..]
> > Where do we do the above? And has this been proven to be an issue?
>
> um... hundreds of cases.
>
> deep-stack spin_lock_irqsave() lockup reports from multiple CPUs (3 cpus)
> happening at the same moment + NMI backtraces from all the CPUs (more
> than 3 cpus) that follows the lockups, over not-so-fast serial console.
> exactly the bug report I received two days ago. so which one of the CPUs
> here is a good candidate to successfully emit all of the pending logbuf
> entries? none. all of them either have local IRQs disabled, or dump_stack()
> from either backtrace IPI or backtrace NMI (depending on the configuration).
and, Steven, one more thing. wondering what's your opinion.
suppose we have consoe_owner hand off enabled, 1 non-atomic CPU doing
printk-s and several atomic CPUs doing printk-s. Is proposed hand off
scheme really useful in this case? CPUs will now
a) print their lines (a potentially slow call_console_drivers())
and
b) spin in vprintk_emit on console_owner with local IRQs disabled
waiting for either non-atomic printk CPU or another atomic CPU
to finish printing its line (call_console_drivers()) and to hand
off printing. so current CPU, after busy-waiting for foreign CPU's
call_console_drivers(), will go and do his own call_console_drivers().
which, time-wise, simply doubles (roughly) the amount of time that
CPU spends in printk()->console_unlock(). agreed?
if we previously could have a case when non-atomic printk CPU would
grab the console_sem and print all atomic printk CPUs messages first,
and then its own messages, thus atomic printk CPUs would have just
log_store(), now we will have CPUs to call_console_driver() and to
spin on console_sem owner waiting for call_console_driver() on a foreign
CPU [not all of them: it's one CPU doing the print out and one CPU
spinning console_owner. but overall I think all CPUs will experience
that spin on console_sem waiting for call_console_driver() and then do
its own call_console_driver()].
even two CPUs case is not so simple anymore. see below.
- first, assume one CPU is atomic and one is non-atomic.
- second, assume that both CPUs are atomic CPUs, and go thought it again.
CPU0 CPU1
printk() printk()
log_store()
log_store()
console_unlock()
set console_owner
sees console_owner
sets console_waiter
spin
call_console_drivers()
sees console_waiter
break
printk()
log_store()
console_unlock()
set console_owner
sees console_owner
sets console_waiter
spin
call_console_drivers()
sees console_waiter
break
printk()
log_store()
console_unlock()
set console_owner
sees console_owner
sets console_waiter
spin
call_console_drivers()
sees console_waiter
break
printk()
log_store()
console_unlock()
set console_owner
sees console_owner
sets console_waiter
spin
....
that "wait for call_console_drivers() on another CPU and then do
its own call_console_drivers()" pattern does look dangerous. the
benefit of hand-off is really fragile sometimes, isn't it?
-ss
Powered by blists - more mailing lists