[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160714221251.GE3057@ubuntu>
Date: Thu, 14 Jul 2016 15:12:51 -0700
From: Viresh Kumar <viresh.kumar@...aro.org>
To: Jan Kara <jack@...e.cz>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
rjw@...ysocki.net, Tejun Heo <tj@...nel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
vlevenetz@...sol.com, vaibhav.hiremath@...aro.org,
alex.elder@...aro.org, johan@...nel.org, akpm@...ux-foundation.org,
rostedt@...dmis.org, linux-pm@...r.kernel.org,
Petr Mladek <pmladek@...e.com>,
Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [Query] Preemption (hogging) of the work handler
On 14-07-16, 16:12, Jan Kara wrote:
> Exactly. Calling printk() from certain parts of the kernel (like scheduler
> code or timer code) has been always unsafe because printk itself uses these
> parts and so it can lead to deadlocks. That's why printk_deffered() has
> been introduced as you mention below.
>
> And with sync printk the above deadlock doesn't trigger only by chance - if
> there happened to be a waiter on console_sem while we suspend, the same
> deadlock would trigger because up(&console_sem) will try to wake him up and
> the warning in timekeeping code will cause recursive printk.
>
> So I think your patch doesn't really address the real issue - it only
> works around the particular WARN_ON(timekeeping_enabled) warning but if
> there was a different warning in timekeeping code which would trigger, it
> has a potential for causing recursive printk deadlock (and indeed we had
> such issues previously - see e.g. 504d58745c9c "timer: Fix lock inversion
> between hrtimer_bases.lock and scheduler locks").
>
> So there are IMHO two issues here worth looking at:
>
> 1) I didn't find how a wakeup would would lead to calling to ktime_get() in
> the current upstream kernel or even current RT kernel. Maybe this is a
> problem specific to the 3.10 kernel you are using? If yes, we don't have to
> do anything for current upstream AFAIU.
I haven't checked that earlier, but I see the path in both 3.10 and mainline.
vprintk_emit
-> wake_up_process
-> try_to_wake_up
-> ttwu_queue
-> ttwu_do_activate
-> ttwu_activate
-> activate_task
-> enqueue_task (sched/core.c)
-> enqueue_task_rt (rt.c)
-> enqueue_rt_entity
-> __enqueue_rt_entity
-> inc_rt_tasks
-> inc_rt_group
-> start_rt_bandwidth
-> start_bandwidth_timer
-> __hrtimer_start_range_ns
-> ktime_get()
> If I just missed how wakeup can call into ktime_get() in current upstream,
> there is another question:
>
> 2) Is it OK that printk calls wakeup so late during suspend?
To clarify again to everybody, we are talking about the place where all non-boot
CPUs are already hot-unplugged and the last running one has disabled interrupts.
I believe that we can't do migration at all now, right? What will we get by
calling wake_up_process() now anyway ?
> I believe it
> is but I'm neither scheduler nor suspend expert. If it is OK, and wakeup
> can lead to ktime_get() in current upstream, then this contradicts the
> check WARN_ON(timekeeping_suspended) in ktime_get() and something is wrong.
>
> Adding Thomas to CC as timer / RT expert...
Thanks.
--
viresh
Powered by blists - more mailing lists