[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160812094447.GD7339@pathway.suse.cz>
Date: Fri, 12 Aug 2016 11:44:47 +0200
From: Petr Mladek <pmladek@...e.com>
To: Viresh Kumar <viresh.kumar@...aro.org>
Cc: Jan Kara <jack@...e.cz>, Andrew Morton <akpm@...ux-foundation.org>,
Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
Jan Kara <jack@...e.com>, Tejun Heo <tj@...nel.org>,
Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Byungchul Park <byungchul.park@....com>,
Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
vlevenetz@...sol.com,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Subject: Re: [PATCH v10 1/2] printk: Make printk() completely async
On Wed 2016-08-10 14:17:55, Viresh Kumar wrote:
> +Vladi/Greg,
>
> On Wed, Apr 6, 2016 at 1:27 AM, Jan Kara <jack@...e.cz> wrote:
> > On Mon 04-04-16 15:51:49, Andrew Morton wrote:
>
> >> > +static int __init init_printk_kthread(void)
> >> > +{
> >> > + struct task_struct *thread;
> >> > +
> >> > + if (printk_sync)
> >> > + return 0;
> >> > +
> >> > + thread = kthread_run(printk_kthread_func, NULL, "printk");
> >>
> >> This gets normal scheduling policy, so a spinning userspace SCHED_FIFO
> >> task will block printk for ever. This seems bad.
> >
> > I have to research this a bit but won't the SCHED_FIFO task that has
> > potentially unbounded amount of work lockup the CPU even though it does
> > occasional cond_resched()?
>
> We are facing complete hogs because of the printk thread being a SCHED_FIFO
> task and have this patch to fix it up for now.
>
> Author: Vladislav Levenetz <vblagoev@...sol.com>
> Date: Wed Aug 10 13:58:00 2016 -0700
>
> SW-7786: printk: Lower the priority of printk thread
>
> Flooding the console (with a test module) in a tight loop indefinitely
> makes android user interface very sluggish. Opening YouTube app and the
> device hangs and becomes even more unresponsive to the point it
> completely hangs.
>
> The asynchronous printk thread is a SCHED FIFO thread with priority
> MAX_RT_PRIO - 1. If we create it as a simple thread (i.e. no SCHED FIFO)
> instead, we observe much better performance using the same printk flood
> test. We don't even notice any kind of sluggishness during device usage.
> We can play a YouTube clip smoothly and use the device normally in
> general. The kernel log looks fine as well, as the flood of messages
> continue normally.
>
> Signed-off-by: Vladislav Levenetz <vblagoev@...sol.com>
> Signed-off-by: Viresh Kumar <viresh.kumar@...aro.org>
> ---
> kernel/printk/printk.c | 4 ----
> 1 file changed, 4 deletions(-)
>
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index c32872872cb6..ad5b30e5e6d9 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2856,9 +2856,6 @@ static int printk_kthread_func(void *data)
> static int __init_printk_kthread(void)
> {
> struct task_struct *thread;
> - struct sched_param param = {
> - .sched_priority = MAX_RT_PRIO - 1,
> - };
>
> if (!printk_kthread_can_run || printk_sync || printk_kthread)
> return 0;
> @@ -2870,7 +2867,6 @@ static int __init_printk_kthread(void)
> return PTR_ERR(thread);
> }
>
> - sched_setscheduler(thread, SCHED_FIFO, ¶m);
> printk_kthread = thread;
> return 0;
> }
IMHO, this is fine. We force the synchronous mode in critical
situations anyway.
But I was curious if we could hit a printk from the wake_up_process().
The change above causes using the fair scheduler and there is
the following call chain [*]
vprintk_emit()
-> wake_up_process()
-> try_to_wake_up()
-> ttwu_queue()
-> ttwu_do_activate()
-> ttwu_activate()
-> activate_task()
-> enqueue_task()
-> enqueue_task_fair() via p->sched_class->enqueue_task
-> cfs_rq_of()
-> task_of()
-> WARN_ON_ONCE(!entity_is_task(se))
We should never trigger this because printk_kthread is a task.
But what if the date gets inconsistent?
Then there is the following chain:
vprintk_emit()
-> wake_up_process()
-> try_to_wake_up()
-> ttwu_queue()
-> ttwu_do_activate()
-> ttwu_activate()
-> activate_task()
-> enqueue_task()
-> enqueue_task_fair() via p->sched_class->enqueue_task
->hrtick_update()
-> hrtick_start_fair()
-> WARN_ON(task_rq(p) != rq)
This looks like another paranoid consistency check that might be
triggered when the scheduler gets messed.
I see few possible solutions:
1. Replace the WARN_ONs by printk_deferred().
This is the usual solution but it would make debugging less convenient.
2. Force synchronous printk inside WARN()/BUG() macros.
This would make sense even from other reasons. These are printed
when the system is in a strange state. There is no guarantee that
the printk_kthread will get scheduled.
3. Force printk_deferred() inside WARN()/BUG() macros via the per-CPU
printk_func.
It might be elegant. But we do not want this outside the scheduler
code. Therefore we would need special variants of WARN_*_SCHED()
BUG_*_SCHED() macros.
I personally prefer the 2nd solution. What do you think about it,
please?
Best Regards,
Petr
Powered by blists - more mailing lists