[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180509144016.GA25742@redhat.com>
Date: Wed, 9 May 2018 16:40:16 +0200
From: Oleg Nesterov <oleg@...hat.com>
To: "Eric W. Biederman" <ebiederm@...ssion.com>
Cc: Johannes Weiner <hannes@...xchg.org>,
Michal Hocko <mhocko@...nel.org>,
Kirill Tkhai <ktkhai@...tuozzo.com>, akpm@...ux-foundation.org,
peterz@...radead.org, viro@...iv.linux.org.uk, mingo@...nel.org,
paulmck@...ux.vnet.ibm.com, keescook@...omium.org, riel@...hat.com,
tglx@...utronix.de, kirill.shutemov@...ux.intel.com,
marcos.souza.org@...il.com, hoeun.ryu@...il.com,
pasha.tatashin@...cle.com, gs051095@...il.com, dhowells@...hat.com,
rppt@...ux.vnet.ibm.com, linux-kernel@...r.kernel.org,
Balbir Singh <balbir@...ux.vnet.ibm.com>,
Tejun Heo <tj@...nel.org>
Subject: Re: [PATCH] memcg: Replace mm->owner with mm->memcg
On 05/07, Eric W. Biederman wrote:
>
> Oleg Nesterov <oleg@...hat.com> writes:
>
> > before your patch get_mem_cgroup_from_mm() looks at mm->owner == current
> > (in this case) and mem_cgroup_from_task() should return the correct memcg
> > even if execing task migrates after bprm_mm_init(). At least in the common
> > case when the old mm is not shared.
> >
> > After your patch the memory allocations in copy_strings() won't be accounted
> > correctly, bprm->mm->memcg is wrong if this task migrates. And iiuc your recent
> > "[PATCH 2/2] memcg: Close the race between migration and installing bprm->mm as mm"
> > doesn't fix the problem.
> >
> > No?
>
> The patch does solve the issue. There should be nothing a userspace
> process can observe that should tell it where in the middle of exec
> such a migration happend so placing the migration at what from the
> kernel's perspective might be technically later should not be a problem.
>
> If it is a problem the issue is that there is a way to observe the
> difference.
So. The task migrates from some MEMCG right after bprm_mm_init().
copy_strings() triggers OOM in MEMCG. This is quite possible, it can use a lot
of memory and that is why we have acct_arg_size() to make these allocations
visible to oom killer.
task_in_mem_cgroup(MEMCG) returns false and oom killer has to kill another
innocent process in MEMCG.
Does this look like a way to observe the difference?
> > Perhaps we can change get_mem_cgroup_from_mm() to use
> > mem_cgroup_from_css(current, memory_cgrp_id) if mm->memcg == NULL?
>
> Please God no. Having any unnecessary special case is just going to
> confuse people and cause bugs.
To me the unnecessary special case is the new_mm->memcg which is used for
accounting but doesn't follow migration till exec_mmap(). But I won't argue.
Oleg.
Powered by blists - more mailing lists