linux-kernel - Re: [PATCH v10 1/2] printk: Make printk() completely async

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <57B1D12A.6030106@mm-sol.com>
Date:	Mon, 15 Aug 2016 17:26:50 +0300
From:	Vladislav Levenetz <vlevenetz@...sol.com>
To:	Petr Mladek <pmladek@...e.com>,
	Viresh Kumar <viresh.kumar@...aro.org>
Cc:	Jan Kara <jack@...e.cz>, Andrew Morton <akpm@...ux-foundation.org>,
	Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
	Jan Kara <jack@...e.com>, Tejun Heo <tj@...nel.org>,
	Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Byungchul Park <byungchul.park@....com>,
	Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Subject: Re: [PATCH v10 1/2] printk: Make printk() completely async



On 08/12/2016 12:44 PM, Petr Mladek wrote:
> On Wed 2016-08-10 14:17:55, Viresh Kumar wrote:
>> +Vladi/Greg,
>>
>> On Wed, Apr 6, 2016 at 1:27 AM, Jan Kara <jack@...e.cz> wrote:
>>> On Mon 04-04-16 15:51:49, Andrew Morton wrote:
>>>>> +static int __init init_printk_kthread(void)
>>>>> +{
>>>>> +   struct task_struct *thread;
>>>>> +
>>>>> +   if (printk_sync)
>>>>> +           return 0;
>>>>> +
>>>>> +   thread = kthread_run(printk_kthread_func, NULL, "printk");
>>>> This gets normal scheduling policy, so a spinning userspace SCHED_FIFO
>>>> task will block printk for ever.  This seems bad.
>>> I have to research this a bit but won't the SCHED_FIFO task that has
>>> potentially unbounded amount of work lockup the CPU even though it does
>>> occasional cond_resched()?
>> We are facing complete hogs because of the printk thread being a SCHED_FIFO
>> task and have this patch to fix it up for now.
>>
>> Author: Vladislav Levenetz <vblagoev@...sol.com>
>> Date:   Wed Aug 10 13:58:00 2016 -0700
>>
>>      SW-7786: printk: Lower the priority of printk thread
>>
>>      Flooding the console (with a test module) in a tight loop indefinitely
>>      makes android user interface very sluggish. Opening YouTube app and the
>>      device hangs and becomes even more unresponsive to the point it
>>      completely hangs.
>>
>>      The asynchronous printk thread is a SCHED FIFO thread with priority
>>      MAX_RT_PRIO - 1. If we create it as a simple thread (i.e. no SCHED FIFO)
>>      instead, we observe much better performance using the same printk flood
>>      test. We don't even notice any kind of sluggishness during device usage.
>>      We can play a YouTube clip smoothly and use the device normally in
>>      general.  The kernel log looks fine as well, as the flood of messages
>>      continue normally.
>>
>>      Signed-off-by: Vladislav Levenetz <vblagoev@...sol.com>
>>      Signed-off-by: Viresh Kumar <viresh.kumar@...aro.org>
>> ---
>>   kernel/printk/printk.c | 4 ----
>>   1 file changed, 4 deletions(-)
>>
>> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
>> index c32872872cb6..ad5b30e5e6d9 100644
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -2856,9 +2856,6 @@ static int printk_kthread_func(void *data)
>>   static int __init_printk_kthread(void)
>>   {
>>          struct task_struct *thread;
>> -       struct sched_param param = {
>> -               .sched_priority = MAX_RT_PRIO - 1,
>> -       };
>>
>>          if (!printk_kthread_can_run || printk_sync || printk_kthread)
>>                  return 0;
>> @@ -2870,7 +2867,6 @@ static int __init_printk_kthread(void)
>>                  return PTR_ERR(thread);
>>          }
>>
>> -       sched_setscheduler(thread, SCHED_FIFO, &param);
>>          printk_kthread = thread;
>>          return 0;
>>   }
> IMHO, this is fine. We force the synchronous mode in critical
> situations anyway.
>
> But I was curious if we could hit a printk from the wake_up_process().
> The change above causes using the fair scheduler and there is
> the following call chain [*]
>
>    vprintk_emit()
>    -> wake_up_process()
>     -> try_to_wake_up()
>      -> ttwu_queue()
>       -> ttwu_do_activate()
>        -> ttwu_activate()
>         -> activate_task()
> 	-> enqueue_task()
> 	 -> enqueue_task_fair()	    via p->sched_class->enqueue_task
> 	  -> cfs_rq_of()
> 	   -> task_of()
> 	    -> WARN_ON_ONCE(!entity_is_task(se))
>
> We should never trigger this because printk_kthread is a task.
> But what if the date gets inconsistent?
>
> Then there is the following chain:
>
>    vprintk_emit()
>    -> wake_up_process()
>     -> try_to_wake_up()
>      -> ttwu_queue()
>       -> ttwu_do_activate()
>        -> ttwu_activate()
>         -> activate_task()
> 	-> enqueue_task()
> 	 -> enqueue_task_fair()	    via p->sched_class->enqueue_task
> 	  ->hrtick_update()
> 	   -> hrtick_start_fair()
> 	    -> WARN_ON(task_rq(p) != rq)
>
> This looks like another paranoid consistency check that might be
> triggered when the scheduler gets messed.
>
> I see few possible solutions:
>
> 1. Replace the WARN_ONs by printk_deferred().
>
>     This is the usual solution but it would make debugging less convenient.
>
>
> 2. Force synchronous printk inside WARN()/BUG() macros.
>
>     This would make sense even from other reasons. These are printed
>     when the system is in a strange state. There is no guarantee that
>     the printk_kthread will get scheduled.
>
>
> 3. Force printk_deferred() inside WARN()/BUG() macros via the per-CPU
>     printk_func.
>
>     It might be elegant. But we do not want this outside the scheduler
>     code. Therefore we would need special variants of  WARN_*_SCHED()
>     BUG_*_SCHED() macros.
>
>
> I personally prefer the 2nd solution. What do you think about it,
> please?
>
>
> Best Regards,
> Petr

Hi Petr,

Sorry with for the late reply.

Hitting a WARN()/BUG() from wake_up calls will lead to a deadlock if 
only a single CPU is running.
We already had such a situation with system suspend. During a specific 
test on our device sometimes we hit a WARN from the time keeping core. 
(Cannot recall which one exactly. Viresh have it) from a printk wake_up 
path during system suspend and with already only one CPU running.
So we were forced to make printing synchronous in the suspend path prior 
disabling all non-boot cpu's.

Your solution number 2) sounds reasonable to me.

I'm wondering if we could hit a WARN()/BUG() somewhere from the fair 
scheduler like the example you made for the RT sched?

Thanks.