[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160420212922.GH4775@htj.duckdns.org>
Date: Wed, 20 Apr 2016 17:29:22 -0400
From: Tejun Heo <tj@...nel.org>
To: Michal Hocko <mhocko@...nel.org>
Cc: Johannes Weiner <hannes@...xchg.org>,
Petr Mladek <pmladek@...e.com>, cgroups@...r.kernel.org,
Cyril Hrubis <chrubis@...e.cz>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH for-4.6-fixes] memcg: remove lru_add_drain_all()
invocation from mem_cgroup_move_charge()
Hello, Michal.
On Sun, Apr 17, 2016 at 08:07:48AM -0400, Michal Hocko wrote:
> On Fri 15-04-16 15:17:19, Tejun Heo wrote:
> > mem_cgroup_move_charge() invokes lru_add_drain_all() so that the pvec
> > pages can be moved too. lru_add_drain_all() schedules and flushes
> > work items on system_wq which depends on being able to create new
> > kworkers to make forward progress. Since 1ed1328792ff ("sched,
> > cgroup: replace signal_struct->group_rwsem with a global
> > percpu_rwsem"), a new task can't be created while in the cgroup
> > migration path and the described lru_add_drain_all() invocation can
> > easily lead to a deadlock.
> >
> > Charge moving is best-effort and whether the pvec pages are migrated
> > or not doesn't really matter. Don't call it during charge moving.
> > Eventually, we want to move the actual charge moving outside the
> > migration path.
> >
> > Signed-off-by: Tejun Heo <tj@...nel.org>
> > Reported-by: Johannes Weiner <hannes@...xchg.org>
>
> I guess
> Debugged-by: Petr Mladek <pmladek@...e.com>
> Reported-by: Cyril Hrubis <chrubis@...e.cz>
Yeah, definitely. Sorry about missing them.
> > Suggested-by: Michal Hocko <mhocko@...nel.org>
> > Fixes: 1ed1328792ff ("sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem")
> > Cc: stable@...r.kernel.org
> > ---
> > Hello,
> >
> > So, this deadlock seems pretty easy to trigger. We'll make the charge
> > moving asynchronous eventually but let's not hold off fixing an
> > immediate problem.
>
> Although this looks rather straightforward and it fixes the immediate
> problem I am little bit nervous about it. As already pointed out in
> other email mem_cgroup_move_charge still depends on mmap_sem for
> read and we might hit an even more subtle lockup if the current holder
> of the mmap_sem for write depends on the task creation (e.g. some of the
> direct reclaim path uses WQ which is really hard to rule out and I even
> think that some shrinkers do this).
>
> I liked your proposal when mem_cgroup_move_charge would be called from a
> context which doesn't hold the problematic rwsem much more. Would that
> be too intrusive for the stable backport?
Yeah, I'm working on the fix but let's plug this one first as it seems
really easy to trigger. I got a couple off-list reports (in and
outside fb) of this triggering.
Thanks.
--
tejun
Powered by blists - more mailing lists