linux-kernel - Re: [BUG] cgroup/workques/fork: deadlock when moving cgroups

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160419140120.GA4126@dhcp22.suse.cz>
Date:	Tue, 19 Apr 2016 10:01:21 -0400
From:	Michal Hocko <mhocko@...nel.org>
To:	Petr Mladek <pmladek@...e.com>
Cc:	Tejun Heo <tj@...nel.org>, Johannes Weiner <hannes@...xchg.org>,
	cgroups@...r.kernel.org, Cyril Hrubis <chrubis@...e.cz>,
	linux-kernel@...r.kernel.org
Subject: Re: [BUG] cgroup/workques/fork: deadlock when moving cgroups

On Mon 18-04-16 16:40:23, Petr Mladek wrote:
> On Fri 2016-04-15 10:38:15, Tejun Heo wrote:
> > > Anyway, before we go that way, can we at least consider the possibility
> > > of removing the kworker creation dependency on the global rwsem? AFAIU
> > > this locking was added because of the pid controller. Do we even care
> > > about something as volatile as kworkers in the pid controller?
> >
> > It's not just pid controller and the global percpu locking has lower
> > hotpath overhead.  We can try to exclude kworkers out of the locking
> > but that can get really nasty and there are already attempts to add
> > cgroup support to workqueue.  Will think more about it.
> 
> I have played with this idea on Friday. Please, find below a POC.
> The worker detection works and the deadlock is removed. But workers
> do not appear in the root cgroups. I am not familiar with the cgroups
> stuff, so this part is much more difficult for me.
> 
> I send it because it might give you an idea when discussing it
> on LSF. Please, let me know if I should continue on this way or
> if it looks too crazy already now.
> 
> 
> >From ca1420926f990892a914d64046ee8d273b876f30 Mon Sep 17 00:00:00 2001
> From: Petr Mladek <pmladek@...e.com>
> Date: Mon, 18 Apr 2016 14:17:17 +0200
> Subject: [POC PATCH] cgroups/workqueus: Do not block forking workqueues by cgroups
>  lock
> 
> This is a POC how to delay cgroups operations when forking workqueue
> workers.  Workqueues are used by some cgroups operations, for example,
> lru_add_drain_all().  Creating new worker might help to avoid a deadlock.
> 
> This patch adds a way to detect whether a workqueue worker is being
> forked, see detect_wq_worker().  For this it needs to make struct
> kthread_create_info and the main worker_thread public.  As a consequence,
> it renames worker_thread to wq_worker_thread.
> 
> Next, cgroups_fork() just initializes the cgroups fields in task_struct.
> It does not really need to be guarded by cgroup_threadgroup_rwsem.
> 
> cgroup_threadgroup_rwsem is taken later when we check if the fork
> is allowed and when we copy the cgroups setting.  But these two
> operations are skipped for workers.
> 
> The result is that workers are not blocked by any cgroups operation
> but they do not appear in the root cgroups.
> 
> There is a preparation of cgroup_delayed_post_fork() that might put
> the workers into the root cgroups.  It is just a copy of
> cgroup_post_fork() where "kthreadd_task" is hardcoded.  It is not yet
> called.  Also it is not protected against other cgroups operations.
> ---
>  include/linux/kthread.h   | 14 +++++++++++++
>  include/linux/workqueue.h |  1 +
>  kernel/cgroup.c           | 53 +++++++++++++++++++++++++++++++++++++++++++++++
>  kernel/fork.c             | 36 +++++++++++++++++++++++---------
>  kernel/kthread.c          | 14 -------------
>  kernel/workqueue.c        |  9 ++++----
>  6 files changed, 98 insertions(+), 29 deletions(-)

This feels too overcomplicated. Can we simply drop the locking in
copy_process if the current == ktreadadd? Would something actually
break?
-- 
Michal Hocko
SUSE Labs