lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 13 Apr 2018 14:29:11 +0300
From:   Kirill Tkhai <ktkhai@...tuozzo.com>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     akpm@...ux-foundation.org, hannes@...xchg.org,
        vdavydov.dev@...il.com, cgroups@...r.kernel.org,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] memcg: Remove memcg_cgroup::id from IDR on
 mem_cgroup_css_alloc() failure

On 13.04.2018 14:20, Michal Hocko wrote:
> On Fri 13-04-18 14:06:40, Kirill Tkhai wrote:
>> On 13.04.2018 14:02, Michal Hocko wrote:
>>> On Fri 13-04-18 12:35:22, Kirill Tkhai wrote:
>>>> On 13.04.2018 11:55, Michal Hocko wrote:
>>>>> On Thu 12-04-18 17:52:04, Kirill Tkhai wrote:
>>>>> [...]
>>>>>> @@ -4471,6 +4477,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
>>>>>>  
>>>>>>  	return &memcg->css;
>>>>>>  fail:
>>>>>> +	mem_cgroup_id_remove(memcg);
>>>>>>  	mem_cgroup_free(memcg);
>>>>>>  	return ERR_PTR(-ENOMEM);
>>>>>>  }
>>>>>
>>>>> The only path which jumps to fail: here (in the current mmotm tree) is 
>>>>> 	error = memcg_online_kmem(memcg);
>>>>> 	if (error)
>>>>> 		goto fail;
>>>>>
>>>>> AFAICS and the only failure path in memcg_online_kmem
>>>>> 	memcg_id = memcg_alloc_cache_id();
>>>>> 	if (memcg_id < 0)
>>>>> 		return memcg_id;
>>>>>
>>>>> I am not entirely clear on memcg_alloc_cache_id but it seems we do clean
>>>>> up properly. Or am I missing something?
>>>>
>>>> memcg_alloc_cache_id() may allocate a lot of memory, in case of the system reached
>>>> memcg_nr_cache_ids cgroups. In this case it iterates over all LRU lists, and double
>>>> size of every of them. In case of memory pressure it can fail. If this occurs,
>>>> mem_cgroup::id is not unhashed from IDR and we leak this id.
>>>
>>> OK, my bad I was looking at the bad code path. So you want to clean up
>>> after mem_cgroup_alloc not memcg_online_kmem. Now it makes much more
>>> sense. Sorry for the confusion on my end.
>>>
>>> Anyway, shouldn't we do the thing in mem_cgroup_free() to be symmetric
>>> to mem_cgroup_alloc?
>>
>> We can't, since it's called from mem_cgroup_css_free(), which doesn't have a deal
>> with idr freeing. All the asymmetry, we see, is because of the trick to unhash ID
>> earlier, then from mem_cgroup_css_free().
> 
> Are you sure. It's been some time since I've looked at the quite complex
> cgroup tear down code but from what I remember, css_free is called on
> the css release (aka when the reference count drops to zero). mem_cgroup_id_put_many
> seems to unpin the css reference so we should have idr_remove by the
> time when css_free is called. Or am I still wrong and should go over the
> brain hurting cgroup removal code again?

mem_cgroup_id_put_many() unpins css, but this may be not the last reference to the css.
Thus, we release ID earlier, then all references to css are freed.

You may look at the commit 73f576c04b94, and it describes the reason we do that earlier:

Author: Johannes Weiner <hannes@...xchg.org>
Date:   Wed Jul 20 15:44:57 2016 -0700

    mm: memcontrol: fix cgroup creation failure after many small jobs
    
    The memory controller has quite a bit of state that usually outlives the
    cgroup and pins its CSS until said state disappears.  At the same time
    it imposes a 16-bit limit on the CSS ID space to economically store IDs
    in the wild.  Consequently, when we use cgroups to contain frequent but
    small and short-lived jobs that leave behind some page cache, we quickly
    run into the 64k limitations of outstanding CSSs.  Creating a new cgroup
    fails with -ENOSPC while there are only a few, or even no user-visible
    cgroups in existence.
    ...

Kirill

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ