linux-kernel - Re: [PATCH 3/5] mm: release memcg percpu data prematurely

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190311172526.GC10823@cmpxchg.org>
Date:   Mon, 11 Mar 2019 13:25:26 -0400
From:   Johannes Weiner <hannes@...xchg.org>
To:     Roman Gushchin <guroan@...il.com>
Cc:     linux-mm@...ck.org, kernel-team@...com,
        linux-kernel@...r.kernel.org, Tejun Heo <tj@...nel.org>,
        Rik van Riel <riel@...riel.com>,
        Michal Hocko <mhocko@...nel.org>, Roman Gushchin <guro@...com>
Subject: Re: [PATCH 3/5] mm: release memcg percpu data prematurely

On Thu, Mar 07, 2019 at 03:00:31PM -0800, Roman Gushchin wrote:
> To reduce the memory footprint of a dying memory cgroup, let's
> release massive percpu data (vmstats_percpu) as early as possible,
> and use atomic counterparts instead.
> 
> A dying cgroup can remain in the dying state for quite a long
> time, being pinned in memory by any reference. For example,
> if a page mlocked by some other cgroup, is charged to the dying
> cgroup, it won't go away until the page will be released.
> 
> A dying memory cgroup can have some memory activity (e.g. dirty
> pages can be flushed after cgroup removal), but in general it's
> not expected to be very active in comparison to living cgroups.
> 
> So reducing the memory footprint by releasing percpu data
> and switching over to atomics seems to be a good trade off.
> 
> Signed-off-by: Roman Gushchin <guro@...com>

Acked-by: Johannes Weiner <hannes@...xchg.org>

One nitpick below:

> @@ -4612,6 +4612,26 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css)
>  	return 0;
>  }
>  
> +static void mem_cgroup_free_percpu(struct rcu_head *rcu)
> +{
> +	struct mem_cgroup *memcg = container_of(rcu, struct mem_cgroup, rcu);
> +
> +	free_percpu(memcg->vmstats_percpu_offlined);
> +	WARN_ON_ONCE(memcg->vmstats_percpu);
> +
> +	css_put(&memcg->css);

Nitpick: I had to double take seeing a "mem_cgroup_free_*" function
that does css_put(). We use "free" terminology (mem_cgroup_css_free,
memcg_free_kmem, memcg_free_shrinker_maps, mem_cgroup_free) from the
.css_free callback, which only happens when the last reference is put.

Can we go with something less ambigous? We can add "rcu" and drop the
mem_cgroup prefix since it's narrowly scoped. "percpu_rcu_free"?

> +static void mem_cgroup_offline_percpu(struct mem_cgroup *memcg)
> +{
> +	memcg->vmstats_percpu_offlined = (struct memcg_vmstats_percpu __percpu*)
> +		rcu_dereference(memcg->vmstats_percpu);
> +	rcu_assign_pointer(memcg->vmstats_percpu, NULL);
> +
> +	css_get(&memcg->css);
> +	call_rcu(&memcg->rcu, mem_cgroup_free_percpu);
> +}
> +
>  static void mem_cgroup_css_offline(struct cgroup_subsys_state *css)
>  {
>  	struct mem_cgroup *memcg = mem_cgroup_from_css(css);