linux-kernel - Re: [BUG] cgroup/workques/fork: deadlock when moving cgroups

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160413183309.GG3676@htj.duckdns.org>
Date:	Wed, 13 Apr 2016 14:33:09 -0400
From:	Tejun Heo <tj@...nel.org>
To:	Petr Mladek <pmladek@...e.com>
Cc:	cgroups@...r.kernel.org, Michal Hocko <mhocko@...e.cz>,
	Cyril Hrubis <chrubis@...e.cz>, linux-kernel@...r.kernel.org,
	Johannes Weiner <hannes@...xchg.org>
Subject: Re: [BUG] cgroup/workques/fork: deadlock when moving cgroups

Hello, Petr.

(cc'ing Johannes)

On Wed, Apr 13, 2016 at 11:42:16AM +0200, Petr Mladek wrote:
...
> By other words, "memcg_move_char/2860" flushes a work. But it cannot
> get flushed because one worker is blocked and another one could not
> get created. All these operations are blocked by the very same
> "memcg_move_char/2860".
> 
> Note that also "systemd/1" is waiting for "cgroup_mutex" in
> proc_cgroup_show(). But it seems that it is not in the main
> cycle causing the deadlock.
> 
> I am able to reproduce this problem quite easily (within few minutes).
> There are often even more tasks waiting for the cgroups-related locks
> but they are not causing the deadlock.
> 
> 
> The question is how to solve this problem. I see several possibilities:
> 
>   + avoid using workqueues in lru_add_drain_all()
> 
>   + make lru_add_drain_all() killable and restartable
> 
>   + do not block fork() when lru_add_drain_all() is running,
>     e.g. using some lazy techniques like RCU, workqueues
> 
>   + at least do not block fork of workers; AFAIK, they have a limited
>      cgroups usage anyway because they are marked with PF_NO_SETAFFINITY
> 
> 
> I am willing to test any potential fix or even work on the fix.
> But I do not have that big insight into the problem, so I would
> need some pointers.

An easy solution would be to make lru_add_drain_all() use a
WQ_MEM_RECLAIM workqueue.  A better way would be making charge moving
asynchronous similar to cpuset node migration but I don't know whether
that's realistic.  Will prep a patch to add a rescuer to
lru_add_drain_all().

Thanks.

-- 
tejun