[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171214221831.3ead0298@vmware.local.home>
Date: Thu, 14 Dec 2017 22:18:31 -0500
From: Steven Rostedt <rostedt@...dmis.org>
To: Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
Cc: Tejun Heo <tj@...nel.org>,
Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
Petr Mladek <pmladek@...e.com>, Jan Kara <jack@...e.cz>,
Andrew Morton <akpm@...ux-foundation.org>,
Peter Zijlstra <peterz@...radead.org>,
Rafael Wysocki <rjw@...ysocki.net>,
Pavel Machek <pavel@....cz>,
Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
linux-kernel@...r.kernel.org
Subject: Re: [RFC][PATCHv6 00/12] printk: introduce printing kernel thread
On Fri, 15 Dec 2017 11:10:24 +0900
Sergey Senozhatsky <sergey.senozhatsky.work@...il.com> wrote:
> Steven, your approach works ONLY when we have the following preconditions:
>
> a) there is a CPU that is calling printk() from the 'safe' (non-atomic,
> etc) context
>
> what does guarantee that? what happens if there is NO non-atomic
> CPU or that non-atomic simplky missses the console_owner != false
> point? we are going to conclude
>
> "if printk() doesn't work for you, it's because you are holding it wrong"?
>
>
> what if that non-atomic CPU does not call printk(), but instead
> it does console_lock()/console_unlock()? why there is no handoff?
>
> CPU0 CPU1 ~ CPU10
> in atomic contexts [!]. ping-ponging console_sem
> ownership to each other. while what they really
> need to do is to simply up() and let CPU0 to
> handle it.
> printk
> console_lock()
> schedule()
> ...
> printk
> printk
> ...
> printk
> printk
>
> up()
>
> // woken up
> console_unlock()
>
> why do we make an emphasis on fixing vprintk_printk()?
Where do we do the above? And has this been proven to be an issue? If
it has, I think it's a separate issue than what I proposed. As what I
proposed is to fix the case where lots of CPUs are doing printks, and
only one actually does the write.
>
>
> b) non-atomic CPU sees console_owner set (which is set for a very short
> period of time)
>
> again. what if that non-atomic CPU does not see console_owner?
> "don't use printk()"?
May I ask, why are we doing the printk in the first place?
>
> c) the task that is looping in console_unlock() sees non-atomic CPU when
> console_owner is set.
I haven't looked at the latest code, but my last patch didn't care
about "atomic" and "non-atomic" issues, because I don't know if that is
indeed an issue in the real world.
>
>
> IOW, we need to have
>
>
> the right CPU (a) at the very right moment (b && c) doing the very right thing.
>
>
> * and the "very right moment" is tiny and additionally depends
> on a foreign CPU [the one that is looping in console_unlock()].
>
>
>
> a simple question - how is that going to work for everyone? are we
> "fixing" a small fraction of possible use-cases?
Still sounds like you are ;-)
>
>
>
> Steven, I thought we reached the agreement [**] that the solution we should
> be working on is a combination of prinkt_kthread and console_sem hand
> off. Simply because it adds the missing "there is a non-atomic CPU wishing
> to console_unlock()" thing.
>
> lkml.kernel.org/r/20171108162813.GA983427@...big577.frc2.facebook.com
>
> https://marc.info/?l=linux-kernel&m=151011840830776&w=2
> https://marc.info/?l=linux-kernel&m=151015141407368&w=2
> https://marc.info/?l=linux-kernel&m=151018900919386&w=2
> https://marc.info/?l=linux-kernel&m=151019815721161&w=2
> https://marc.info/?l=linux-kernel&m=151020275921953&w=2
> ** https://marc.info/?l=linux-kernel&m=151020404622181&w=2
> ** https://marc.info/?l=linux-kernel&m=151020565222469&w=2
I'm still fine with the hybrid approach, but I want to see a problem
first before we fix it.
>
>
> what am I missing?
The reproducer. Let Tejun do the test with just my patch, and if it
still has problems, then we can add more logic to the code. I like to
take things one step at a time. What I'm seeing is that there was a
problem that could be solved with my solution, but during this process,
people have found hundreds of theoretical problems and started down the
path to solve each of them. I want to see a real bug, before we go down
the path of having to have external threads and such, to solve a bug
that we don't really know exists yet.
-- Steve
Powered by blists - more mailing lists