lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160718110134.GB6782@quack2.suse.cz>
Date:	Mon, 18 Jul 2016 13:01:34 +0200
From:	Jan Kara <jack@...e.cz>
To:	Viresh Kumar <viresh.kumar@...aro.org>
Cc:	Jan Kara <jack@...e.cz>,
	Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
	Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
	rjw@...ysocki.net, Tejun Heo <tj@...nel.org>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	vlevenetz@...sol.com, vaibhav.hiremath@...aro.org,
	alex.elder@...aro.org, johan@...nel.org, akpm@...ux-foundation.org,
	rostedt@...dmis.org, linux-pm@...r.kernel.org,
	Petr Mladek <pmladek@...e.com>,
	Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [Query] Preemption (hogging) of the work handler

On Thu 14-07-16 15:12:51, Viresh Kumar wrote:
> On 14-07-16, 16:12, Jan Kara wrote:
> > Exactly. Calling printk() from certain parts of the kernel (like scheduler
> > code or timer code) has been always unsafe because printk itself uses these
> > parts and so it can lead to deadlocks. That's why printk_deffered() has
> > been introduced as you mention below.
> > 
> > And with sync printk the above deadlock doesn't trigger only by chance - if
> > there happened to be a waiter on console_sem while we suspend, the same
> > deadlock would trigger because up(&console_sem) will try to wake him up and
> > the warning in timekeeping code will cause recursive printk.
> > 
> > So I think your patch doesn't really address the real issue - it only
> > works around the particular WARN_ON(timekeeping_enabled) warning but if
> > there was a different warning in timekeeping code which would trigger, it
> > has a potential for causing recursive printk deadlock (and indeed we had
> > such issues previously - see e.g. 504d58745c9c "timer: Fix lock inversion
> > between hrtimer_bases.lock and scheduler locks").
> > 
> > So there are IMHO two issues here worth looking at:
> > 
> > 1) I didn't find how a wakeup would would lead to calling to ktime_get() in
> > the current upstream kernel or even current RT kernel. Maybe this is a
> > problem specific to the 3.10 kernel you are using? If yes, we don't have to
> > do anything for current upstream AFAIU.
> 
> I haven't checked that earlier, but I see the path in both 3.10 and mainline.
> 
> vprintk_emit
>  -> wake_up_process
>   -> try_to_wake_up
>    -> ttwu_queue
>     -> ttwu_do_activate
>      -> ttwu_activate
>       -> activate_task
>        -> enqueue_task (sched/core.c)
>         -> enqueue_task_rt (rt.c)
>          -> enqueue_rt_entity
>           -> __enqueue_rt_entity
>            -> inc_rt_tasks
>             -> inc_rt_group
>              -> start_rt_bandwidth
>               -> start_bandwidth_timer
>                -> __hrtimer_start_range_ns
>                 -> ktime_get()

Yeah, you are right.

> > If I just missed how wakeup can call into ktime_get() in current upstream,
> > there is another question:
> > 
> > 2) Is it OK that printk calls wakeup so late during suspend?
> 
> To clarify again to everybody, we are talking about the place where all
> non-boot CPUs are already hot-unplugged and the last running one has
> disabled interrupts.
> 
> I believe that we can't do migration at all now, right? What will we get by
> calling wake_up_process() now anyway ?

As I already wrote to Rafael, wake_up_process() will change the process
state to TASK_RUNNING so that it can run after we resume from suspend.

But seeing that the same problem is in upstream I guess what Sergey did
makes more sense if it works for you. If Sergey's fix does not work for you
due to too many messages being printed during device suspend, then we will
have to try something else...

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ