[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160713001808.GT4695@ubuntu>
Date: Tue, 12 Jul 2016 17:18:08 -0700
From: Viresh Kumar <viresh.kumar@...aro.org>
To: Jan Kara <jack@...e.cz>,
Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
rjw@...ysocki.net
Cc: Tejun Heo <tj@...nel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
vlevenetz@...sol.com, vaibhav.hiremath@...aro.org,
alex.elder@...aro.org, johan@...nel.org, akpm@...ux-foundation.org,
rostedt@...dmis.org,
Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
linux-pm@...r.kernel.org
Subject: Re: [Query] Preemption (hogging) of the work handler
On 12-07-16, 16:19, Viresh Kumar wrote:
> Okay, we have tracked this BUG and its really interesting.
>
> I hacked the platform's serial driver to implement a putchar() routine
> that simply writes to the FIFO in polling mode, that helped us in
> tracing on where we are going wrong.
>
> The problem is that we are running asynchronous printks and we call
> wake_up_process() from the last running CPU which has disabled
> interrupts. That takes us to: try_to_wake_up().
>
> In our case the CPU gets deadlocked on this line in try_to_wake_up().
>
> raw_spin_lock_irqsave(&p->pi_lock, flags);
>
> I will explain how:
>
> The try_to_wake_up() function takes us through the scheduler code (RT
> sched), to the hrtimer code, where we eventually call ktime_get() (for
> the MONOTONIC clock used for hrtimer). And this function has this:
>
> WARN_ON(timekeeping_suspended);
>
> This starts another printk while we are in the middle of
> wake_up_process() and the CPU tries to take the above lock again and
> gets stuck there :)
>
> This doesn't happen everytime because we don't always call ktime_get()
> and it is called only if hrtimer_active() returns false.
>
> This happened because of a WARN_ON() but it can happen anyway. Think
> about this case:
>
> - offline all CPUs, except 0
> - call any routine that prints messages after disabling interrupts,
> etc.
> - If any of the function within wake_up_process() does a print, we are
> screwed.
>
> So the thing is that we can't really call wake_up_process() in cases
> where the last CPU disables interrupts. And that's why my fixup patch
> (which moved to synchronous prints after suspend) really works.
Actually, any printk done from wake_up_process() will hit this, even
if all the others CPUs are up as well :)
Its only BUG_ON() which has special handling in printk, and so we
print that safely.
--
viresh
Powered by blists - more mailing lists