[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20171109053358.GD775@jagdpanzerIV>
Date: Thu, 9 Nov 2017 14:33:58 +0900
From: Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
Tejun Heo <tj@...nel.org>,
Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
linux-mm@...ck.org, xiyou.wangcong@...il.com,
dave.hansen@...el.com, hannes@...xchg.org, mgorman@...e.de,
mhocko@...nel.org, pmladek@...e.com, sergey.senozhatsky@...il.com,
vbabka@...e.cz
Subject: Re: [PATCH v3] printk: Add console owner and waiter logic to load
balance console writes
On (11/09/17 00:06), Steven Rostedt wrote:
> What does safe context mean?
"safe" means that we don't cause lockups, stalls, sched throttlings, etc.
by doing console_unlock() from that context [task].
> Do we really want to allow the printk thread to sleep when there's more
> to print? What happens if there's a crash at that moment? How do we safely
> flush out all the data when the printk thread is sleeping?
printk-kthread does not schedule with the console_sem locked. one
of the changes to console_unlock() introduced with printk-kthread,
which we can't have without offloading.
> Now we could have something that uses both nicely. When the
> printk_thread wakes up (we need to figure out when to do that), then it
> could constantly take over.
certainly we can have a better hand-off scheme in printk-kthread patch set.
>
> CPU1 CPU2
> ---- ----
> console_unlock()
> start printing a lot
> (more than one, wake up printk_thread)
>
> printk thread wakes up
>
> becomes the waiter
>
> sees waiter hands off
>
> starts printing
>
> printk()
> becomes waiter
>
> sees waiter hands off
> then becomes new waiter! <-- key
>
> starts printing
> sees waiter hands off
> continues printing
there are corners cases here. learned the hard way. real reproducers
do exist.
wake_up_process() may enqueue printk_thread on the same rq that
current printk task is running on. so if your printk(), for instance,
is from IRQ then offloading won't happen.
> That is, we keep the waiter logic, and if anyone starts printing too
> much, it wakes up the printk thread (hopefully on another CPU, or the
> printk thread should migrate) when the printk thread starts running it
it must migrate, yes. currently I'm playing games with the affinity
mask of printk-kthread when I do offloading.
> becomes the new waiter if the console lock is still held (just like in
> printk). Then it gets handed off the printk. We could just have the
> printk thread keep going, though I'm not sure I would want to let it
> schedule while printing.
yes, scheduling under console_sem is not right. we don't want this.
not anymore, at least.
> But it could also hand off printks (like
> above), but then take it back immediately. This would mean that a
> printk caller from a "critical" path will only get to do one message,
> before the printk thread asks for it again.
>
> Perhaps we could have more than one printk thread that migrates around,
> and they each hand off the printing. This makes sure the printing
> always happens and that it never stops due to the console_lock holder
> sleeping and we never lock up one CPU that does printing. This would
> work with just two printk threads. When one starts a printk loop,
> another one wakes up on another CPU and becomes the waiter to get the
> handoff of the console_lock. Then the first could schedule out (migrate
> if the current CPU is busy), and take over. In fact, this would
> basically have two CPUs bouncing back and forth to do the printing.
can be. I pushed it much further, once. [probably too far].
and had per-CPU printk-kthreads :)
> This gives us our cake and we get to eat it too.
>
> One, printing never stops (no scheduling out), as there's two threads
> to share the load (obiously only on SMP machines).
>
> There's no lock up. There's two threads that print a little, pass off
> the console lock, do a cond_resched(), then takes over again.
>
> Bascially, what I'm saying is that this is not two different solutions.
> There is two algorithms that can work together to give us reliable
> output and not lock up the system in doing so.
sure, I understand.
-ss
Powered by blists - more mailing lists