linux-kernel - Re: [PATCH] memcg: protect concurrent access to mem_cgroup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <ZrYKVtDXzGXaZ7wH@tiehlicka>
Date: Fri, 9 Aug 2024 14:23:50 +0200
From: Michal Hocko <mhocko@...e.com>
To: Shakeel Butt <shakeel.butt@...ux.dev>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
	Johannes Weiner <hannes@...xchg.org>,
	Roman Gushchin <roman.gushchin@...ux.dev>,
	Muchun Song <muchun.song@...ux.dev>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org,
	Meta kernel team <kernel-team@...a.com>, cgroups@...r.kernel.org
Subject: Re: [PATCH] memcg: protect concurrent access to mem_cgroup_idr

On Fri 02-08-24 16:58:22, Shakeel Butt wrote:
> The commit 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure
> after many small jobs") decoupled the memcg IDs from the CSS ID space to
> fix the cgroup creation failures. It introduced IDR to maintain the
> memcg ID space. The IDR depends on external synchronization mechanisms
> for modifications. For the mem_cgroup_idr, the idr_alloc() and
> idr_replace() happen within css callback and thus are protected through
> cgroup_mutex from concurrent modifications. However idr_remove() for
> mem_cgroup_idr was not protected against concurrency and can be run
> concurrently for different memcgs when they hit their refcnt to zero.
> Fix that.
> 
> We have been seeing list_lru based kernel crashes at a low frequency in
> our fleet for a long time. These crashes were in different part of
> list_lru code including list_lru_add(), list_lru_del() and reparenting
> code. Upon further inspection, it looked like for a given object (dentry
> and inode), the super_block's list_lru didn't have list_lru_one for the
> memcg of that object. The initial suspicions were either the object is
> not allocated through kmem_cache_alloc_lru() or somehow
> memcg_list_lru_alloc() failed to allocate list_lru_one() for a memcg but
> returned success. No evidence were found for these cases.
> 
> Looking more deeper, we started seeing situations where valid memcg's id
> is not present in mem_cgroup_idr and in some cases multiple valid memcgs
> have same id and mem_cgroup_idr is pointing to one of them. So, the most
> reasonable explanation is that these situations can happen due to race
> between multiple idr_remove() calls or race between
> idr_alloc()/idr_replace() and idr_remove(). These races are causing
> multiple memcgs to acquire the same ID and then offlining of one of them
> would cleanup list_lrus on the system for all of them. Later access from
> other memcgs to the list_lru cause crashes due to missing list_lru_one.
> 
> Fixes: 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure after many small jobs")
> Signed-off-by: Shakeel Butt <shakeel.butt@...ux.dev>

Sorry for being late here. Thanks for catching this up.
Acked-by: Michal Hocko <mhocko@...e.com>

Cc: stable is definitely due here as those races are really nasty to
debug.

Thanks!

> ---
>  mm/memcontrol.c | 22 ++++++++++++++++++++--
>  1 file changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index b889a7fbf382..8971d3473a7b 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3364,11 +3364,28 @@ static void memcg_wb_domain_size_changed(struct mem_cgroup *memcg)
>  
>  #define MEM_CGROUP_ID_MAX	((1UL << MEM_CGROUP_ID_SHIFT) - 1)
>  static DEFINE_IDR(mem_cgroup_idr);
> +static DEFINE_SPINLOCK(memcg_idr_lock);
> +
> +static int mem_cgroup_alloc_id(void)
> +{
> +	int ret;
> +
> +	idr_preload(GFP_KERNEL);
> +	spin_lock(&memcg_idr_lock);
> +	ret = idr_alloc(&mem_cgroup_idr, NULL, 1, MEM_CGROUP_ID_MAX + 1,
> +			GFP_NOWAIT);
> +	spin_unlock(&memcg_idr_lock);
> +	idr_preload_end();
> +	return ret;
> +}
>  
>  static void mem_cgroup_id_remove(struct mem_cgroup *memcg)
>  {
>  	if (memcg->id.id > 0) {
> +		spin_lock(&memcg_idr_lock);
>  		idr_remove(&mem_cgroup_idr, memcg->id.id);
> +		spin_unlock(&memcg_idr_lock);
> +
>  		memcg->id.id = 0;
>  	}
>  }
> @@ -3502,8 +3519,7 @@ static struct mem_cgroup *mem_cgroup_alloc(struct mem_cgroup *parent)
>  	if (!memcg)
>  		return ERR_PTR(error);
>  
> -	memcg->id.id = idr_alloc(&mem_cgroup_idr, NULL,
> -				 1, MEM_CGROUP_ID_MAX + 1, GFP_KERNEL);
> +	memcg->id.id = mem_cgroup_alloc_id();
>  	if (memcg->id.id < 0) {
>  		error = memcg->id.id;
>  		goto fail;
> @@ -3648,7 +3664,9 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css)
>  	 * publish it here at the end of onlining. This matches the
>  	 * regular ID destruction during offlining.
>  	 */
> +	spin_lock(&memcg_idr_lock);
>  	idr_replace(&mem_cgroup_idr, memcg, memcg->id.id);
> +	spin_unlock(&memcg_idr_lock);
>  
>  	return 0;
>  offline_kmem:
> -- 
> 2.43.5

-- 
Michal Hocko
SUSE Labs