linux-kernel - Re: [PATCH 4/4] mm: memcontrol: clean up alloc, online, offline, free functions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20151216121727.GL28521@esperanza>
Date:	Wed, 16 Dec 2015 15:17:27 +0300
From:	Vladimir Davydov <vdavydov@...tuozzo.com>
To:	Johannes Weiner <hannes@...xchg.org>
CC:	Andrew Morton <akpm@...ux-foundation.org>,
	Michal Hocko <mhocko@...e.cz>, <linux-mm@...ck.org>,
	<cgroups@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
	<kernel-team@...com>
Subject: Re: [PATCH 4/4] mm: memcontrol: clean up alloc, online, offline,
 free functions

On Tue, Dec 15, 2015 at 02:38:58PM -0500, Johannes Weiner wrote:
> On Mon, Dec 14, 2015 at 08:14:55PM +0300, Vladimir Davydov wrote:
> > On Fri, Dec 11, 2015 at 02:54:13PM -0500, Johannes Weiner wrote:
> > ...
> > > -static int
> > > -mem_cgroup_css_online(struct cgroup_subsys_state *css)
> > > +static struct cgroup_subsys_state * __ref
> > > +mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
> > >  {
> > > -	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
> > > -	struct mem_cgroup *parent = mem_cgroup_from_css(css->parent);
> > > -	int ret;
> > > -
> > > -	if (css->id > MEM_CGROUP_ID_MAX)
> > > -		return -ENOSPC;
> > > +	struct mem_cgroup *parent = mem_cgroup_from_css(parent_css);
> > > +	struct mem_cgroup *memcg;
> > > +	long error = -ENOMEM;
> > >  
> > > -	if (!parent)
> > > -		return 0;
> > > +	memcg = mem_cgroup_alloc();
> > > +	if (!memcg)
> > > +		return ERR_PTR(error);
> > >  
> > >  	mutex_lock(&memcg_create_mutex);
> > 
> > It is pointless to take memcg_create_mutex in ->css_alloc. It won't
> > prevent setting use_hierarchy for parent after a new child was
> > allocated, but before it was added to the list of children (see
> > create_css()). Taking the mutex in ->css_online renders this race
> > impossible. That is, your cleanup breaks use_hierarchy consistency
> > check.
> > 
> > Can we drop this use_hierarchy consistency check at all and allow
> > children of a cgroup with use_hierarchy=1 have use_hierarchy=0? Yeah,
> > that might result in some strangeness if cgroups are created in parallel
> > with use_hierarchy flipped, but is it a valid use case? I surmise, one
> > just sets use_hierarchy for a cgroup once and for good before starting
> > to create sub-cgroups.
> 
> I don't think we have to support airtight exclusion between somebody
> changing the parent attribute and creating new children that inherit
> these attributes. Everything will still work if this race happens.
> 
> Does that mean we have to remove the restriction altogether? I'm not
> convinced. We can just keep it for historical purposes so that we do
> not *encourage* this weird setting.

Well, legacy hierarchy is scheduled to die, so it's too late to
encourage or discourage any setting regarding it.

Besides, hierarchy mode must be enabled for 99% setups, because this is
what systemd does at startup. So I don't think we would hurt anybody by
dropping this check altogether - IMO it'd be fairer than having a check
that might sometimes fail.

It's not something I really care about though, so I don't insist.

> 
> I think that's good enough. Let's just remove the memcg_create_mutex.

I'm fine with it, but I think this deserves a comment in the commit
message.

...
> So, how about the following fixlets on top to address your comments?
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index af8714a..124a802 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -250,13 +250,6 @@ enum res_type {
>  /* Used for OOM nofiier */
>  #define OOM_CONTROL		(0)
>  
> -/*
> - * The memcg_create_mutex will be held whenever a new cgroup is created.
> - * As a consequence, any change that needs to protect against new child cgroups
> - * appearing has to hold it as well.
> - */
> -static DEFINE_MUTEX(memcg_create_mutex);
> -
>  /* Some nice accessors for the vmpressure. */
>  struct vmpressure *memcg_to_vmpressure(struct mem_cgroup *memcg)
>  {
> @@ -2660,14 +2653,6 @@ static inline bool memcg_has_children(struct mem_cgroup *memcg)
>  {
>  	bool ret;
>  
> -	/*
> -	 * The lock does not prevent addition or deletion of children, but
> -	 * it prevents a new child from being initialized based on this
> -	 * parent in css_online(), so it's enough to decide whether
> -	 * hierarchically inherited attributes can still be changed or not.
> -	 */
> -	lockdep_assert_held(&memcg_create_mutex);
> -
>  	rcu_read_lock();
>  	ret = css_next_child(NULL, &memcg->css);
>  	rcu_read_unlock();
> @@ -2730,10 +2715,8 @@ static int mem_cgroup_hierarchy_write(struct cgroup_subsys_state *css,
>  	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
>  	struct mem_cgroup *parent_memcg = mem_cgroup_from_css(memcg->css.parent);
>  
> -	mutex_lock(&memcg_create_mutex);
> -
>  	if (memcg->use_hierarchy == val)
> -		goto out;
> +		return 0;
>  
>  	/*
>  	 * If parent's use_hierarchy is set, we can't make any modifications
> @@ -2752,9 +2735,6 @@ static int mem_cgroup_hierarchy_write(struct cgroup_subsys_state *css,
>  	} else
>  		retval = -EINVAL;
>  
> -out:
> -	mutex_unlock(&memcg_create_mutex);
> -
>  	return retval;
>  }
>  
> @@ -2929,6 +2909,10 @@ static void memcg_offline_kmem(struct mem_cgroup *memcg)
>  
>  static void memcg_free_kmem(struct mem_cgroup *memcg)
>  {
> +	/* css_alloc() failed, offlining didn't happen */
> +	if (unlikely(memcg->kmem_state == KMEM_ONLINE))

It's not a hot-path, so there's no need in using 'unlikely' here apart
from improving readability, but the comment should be enough.

> +		memcg_offline_kmem(memcg);
> +

Calling 'offline' from css_free looks a little bit awkward, but let it
be.

Anyway, it's a really nice cleanup, thanks!

Acked-by: Vladimir Davydov <vdavydov@...tuozzo.com>

>  	if (memcg->kmem_state == KMEM_ALLOCATED) {
>  		memcg_destroy_kmem_caches(memcg);
>  		static_branch_dec(&memcg_kmem_enabled_key);
> @@ -2956,11 +2940,9 @@ static int memcg_update_kmem_limit(struct mem_cgroup *memcg,
>  	mutex_lock(&memcg_limit_mutex);
>  	/* Top-level cgroup doesn't propagate from root */
>  	if (!memcg_kmem_online(memcg)) {
> -		mutex_lock(&memcg_create_mutex);
>  		if (cgroup_is_populated(memcg->css.cgroup) ||
>  		    (memcg->use_hierarchy && memcg_has_children(memcg)))
>  			ret = -EBUSY;
> -		mutex_unlock(&memcg_create_mutex);
>  		if (ret)
>  			goto out;
>  		ret = memcg_online_kmem(memcg);
> @@ -4184,14 +4166,14 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
>  	if (!memcg)
>  		return ERR_PTR(error);
>  
> -	mutex_lock(&memcg_create_mutex);
>  	memcg->high = PAGE_COUNTER_MAX;
>  	memcg->soft_limit = PAGE_COUNTER_MAX;
> -	if (parent)
> +	if (parent) {
>  		memcg->swappiness = mem_cgroup_swappiness(parent);
> +		memcg->oom_kill_disable = parent->oom_kill_disable;
> +	}
>  	if (parent && parent->use_hierarchy) {
>  		memcg->use_hierarchy = true;
> -		memcg->oom_kill_disable = parent->oom_kill_disable;
>  		page_counter_init(&memcg->memory, &parent->memory);
>  		page_counter_init(&memcg->memsw, &parent->memsw);
>  		page_counter_init(&memcg->kmem, &parent->kmem);
> @@ -4209,7 +4191,6 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
>  		if (parent != root_mem_cgroup)
>  			memory_cgrp_subsys.broken_hierarchy = true;
>  	}
> -	mutex_unlock(&memcg_create_mutex);
>  
>  	/* The following stuff does not apply to the root */
>  	if (!parent) {
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/