[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171215021024.GA11199@jagdpanzerIV>
Date: Fri, 15 Dec 2017 11:10:24 +0900
From: Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: Tejun Heo <tj@...nel.org>,
Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
Petr Mladek <pmladek@...e.com>, Jan Kara <jack@...e.cz>,
Andrew Morton <akpm@...ux-foundation.org>,
Peter Zijlstra <peterz@...radead.org>,
Rafael Wysocki <rjw@...ysocki.net>,
Pavel Machek <pavel@....cz>,
Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
linux-kernel@...r.kernel.org,
Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
Subject: Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
Hello,
On (12/14/17 10:11), Tejun Heo wrote:
> Hey, Steven.
>
> On Thu, Dec 14, 2017 at 12:55:06PM -0500, Steven Rostedt wrote:
> > Yes! Please create a reproducer, because I still don't believe there is
> > one. And it's all hand waving until there's an actual report that we can
> > lock up the system with my approach.
>
> Yeah, will do, but out of curiosity, Sergey and I already described
> what the root problem was and you didn't really seem to take that. Is
> that because the explanation didn't make sense to you or us
> misunderstanding what your code does?
I second _everything_ that Tejun has said.
Steven, your approach works ONLY when we have the following preconditions:
a) there is a CPU that is calling printk() from the 'safe' (non-atomic,
etc) context
what does guarantee that? what happens if there is NO non-atomic
CPU or that non-atomic simplky missses the console_owner != false
point? we are going to conclude
"if printk() doesn't work for you, it's because you are holding it wrong"?
what if that non-atomic CPU does not call printk(), but instead
it does console_lock()/console_unlock()? why there is no handoff?
CPU0 CPU1 ~ CPU10
in atomic contexts [!]. ping-ponging console_sem
ownership to each other. while what they really
need to do is to simply up() and let CPU0 to
handle it.
printk
console_lock()
schedule()
...
printk
printk
...
printk
printk
up()
// woken up
console_unlock()
why do we make an emphasis on fixing vprintk_printk()?
b) non-atomic CPU sees console_owner set (which is set for a very short
period of time)
again. what if that non-atomic CPU does not see console_owner?
"don't use printk()"?
c) the task that is looping in console_unlock() sees non-atomic CPU when
console_owner is set.
IOW, we need to have
the right CPU (a) at the very right moment (b && c) doing the very right thing.
* and the "very right moment" is tiny and additionally depends
on a foreign CPU [the one that is looping in console_unlock()].
a simple question - how is that going to work for everyone? are we
"fixing" a small fraction of possible use-cases?
Steven, I thought we reached the agreement [**] that the solution we should
be working on is a combination of prinkt_kthread and console_sem hand
off. Simply because it adds the missing "there is a non-atomic CPU wishing
to console_unlock()" thing.
lkml.kernel.org/r/20171108162813.GA983427@...big577.frc2.facebook.com
https://marc.info/?l=linux-kernel&m=151011840830776&w=2
https://marc.info/?l=linux-kernel&m=151015141407368&w=2
https://marc.info/?l=linux-kernel&m=151018900919386&w=2
https://marc.info/?l=linux-kernel&m=151019815721161&w=2
https://marc.info/?l=linux-kernel&m=151020275921953&w=2
** https://marc.info/?l=linux-kernel&m=151020404622181&w=2
** https://marc.info/?l=linux-kernel&m=151020565222469&w=2
what am I missing?
-ss
Powered by blists - more mailing lists