[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87wovu889o.fsf@xmission.com>
Date: Wed, 23 May 2018 14:46:43 -0500
From: ebiederm@...ssion.com (Eric W. Biederman)
To: Michal Hocko <mhocko@...nel.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Johannes Weiner <hannes@...xchg.org>,
Kirill Tkhai <ktkhai@...tuozzo.com>, peterz@...radead.org,
viro@...iv.linux.org.uk, mingo@...nel.org,
paulmck@...ux.vnet.ibm.com, keescook@...omium.org, riel@...hat.com,
tglx@...utronix.de, kirill.shutemov@...ux.intel.com,
marcos.souza.org@...il.com, hoeun.ryu@...il.com,
pasha.tatashin@...cle.com, gs051095@...il.com, dhowells@...hat.com,
rppt@...ux.vnet.ibm.com, linux-kernel@...r.kernel.org,
Balbir Singh <balbir@...ux.vnet.ibm.com>,
Tejun Heo <tj@...nel.org>, Oleg Nesterov <oleg@...hat.com>
Subject: Re: [PATCH 0/2] mm->owner to mm->memcg fixes
Michal Hocko <mhocko@...nel.org> writes:
> On Thu 10-05-18 14:14:18, Michal Hocko wrote:
>> On Fri 04-05-18 12:26:03, Eric W. Biederman wrote:
>> >
>> > Andrew can you pick up these two fixes as well.
>> >
>> > These address the issues Michal Hocko and Oleg Nesterov noticed.
>>
>> I completely got lost in this thread. There are way too many things
>> discussed at once. Could you repost the full series for a proper review
>> please?
>
> ping
Quick summary of where things stand. (Just getting back from vacation
and catching up with things).
Andrew has picked up the best version of these patches and you can look
at his tree to find the current patches.
Looking at my tree the issues that have been looked at above
the basic patch are:
!CONFIG_MMU support (code motion)
Races during exec. (Roughly solved but I think we can do better by
expanding the scope of
cgroup_threadgroup_change_begin/end in exec and
just making exec atomic wrt to cgroup changes)
While I was on vacation you posted an old concern about a condtion
where get_mem_cgroup_from_mm was not guaranteed to complete, and how
that interacts with charge migration.
Looking at your description the concern is that cgroup_rmdir can succeed
when a cgroup has just an mm in it (and no tasks). The result is
that mem_cgroup_try_charge can loop indefinitely in that case.
That is possible with two processes sharing the same mm, but living in
different memory cgroups. That is a silly useless configuration but
something to support at least to the level of making certain kernel's
don't wedge when it happens by accident or maliciously.
The suggestion of having cgroup_is_populated take this into account
seems like a good general solution but I don't think that is strictly
necessary.
In the spirit of let's make this work. How about:
From: "Eric W. Biederman" <ebiederm@...ssion.com>
Date: Wed, 23 May 2018 13:48:31 -0500
Subject: [PATCH] memcontrol: Guarantee that get_mem_cgroup_from_mm does not
loop forever
The root memory cgroup should always be online, so when returning it
unconditionally bump it's refcount.
The only way for mm->memcg to point to an offline memory cgroup is if
CLONE_VM was used to create two processes that share a mm and those
processes were put in different memory cgroups. If one of those
processes exited and then cgroup_rmdir was called on it's memory
cgroup.
As two processes sharing an mm is useless and highly unlikely there is
no need to handle this case well, it just needs to be handled well
enough to prevent an indefinite loop. So when css_tryget_online fails
just treat the mm as belong to the root memory cgroup.
The cgroup migration and the cgroup remove actions both happen
with the the cgroup_mutex held. So there is no need to worry about
a mm that is migrating cgroups not having a live cgroup.
Signed-off-by: "Eric W. Biederman" <ebiederm@...ssion.com>
---
mm/memcontrol.c | 34 +++++++++++++++++++---------------
1 file changed, 19 insertions(+), 15 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index d74aeba7dfed..dbb197bfc517 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -666,25 +666,29 @@ static void memcg_check_events(struct mem_cgroup *memcg, struct page *page)
static struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm)
{
- struct mem_cgroup *memcg = NULL;
+ struct mem_cgroup *memcg;
rcu_read_lock();
- do {
- /*
- * Page cache insertions can happen withou an
- * actual mm context, e.g. during disk probing
- * on boot, loopback IO, acct() writes etc.
- */
- if (unlikely(!mm))
- memcg = root_mem_cgroup;
- else {
- memcg = rcu_dereference(mm->memcg);
- if (unlikely(!memcg))
- memcg = root_mem_cgroup;
- }
- } while (!css_tryget_online(&memcg->css));
+ /*
+ * Page cache insertions can happen withou an
+ * actual mm context, e.g. during disk probing
+ * on boot, loopback IO, acct() writes etc.
+ */
+ if (unlikely(!mm))
+ goto root_memcg;
+
+ memcg = rcu_dereference(mm->memcg);
+ if (unlikely(!memcg))
+ goto root_memcg;
+ if (!css_tryget_online(&memcg->css))
+ goto root_memcg;
rcu_read_unlock();
return memcg;
+
+root_memcg:
+ css_get(&root_mem_cgroup->css);
+ rcu_read_unlock();
+ return root_mem_cgroup;
}
/**
--
2.14.1
Powered by blists - more mailing lists