[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZrYKVtDXzGXaZ7wH@tiehlicka>
Date: Fri, 9 Aug 2024 14:23:50 +0200
From: Michal Hocko <mhocko@...e.com>
To: Shakeel Butt <shakeel.butt@...ux.dev>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Johannes Weiner <hannes@...xchg.org>,
Roman Gushchin <roman.gushchin@...ux.dev>,
Muchun Song <muchun.song@...ux.dev>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org,
Meta kernel team <kernel-team@...a.com>, cgroups@...r.kernel.org
Subject: Re: [PATCH] memcg: protect concurrent access to mem_cgroup_idr
On Fri 02-08-24 16:58:22, Shakeel Butt wrote:
> The commit 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure
> after many small jobs") decoupled the memcg IDs from the CSS ID space to
> fix the cgroup creation failures. It introduced IDR to maintain the
> memcg ID space. The IDR depends on external synchronization mechanisms
> for modifications. For the mem_cgroup_idr, the idr_alloc() and
> idr_replace() happen within css callback and thus are protected through
> cgroup_mutex from concurrent modifications. However idr_remove() for
> mem_cgroup_idr was not protected against concurrency and can be run
> concurrently for different memcgs when they hit their refcnt to zero.
> Fix that.
>
> We have been seeing list_lru based kernel crashes at a low frequency in
> our fleet for a long time. These crashes were in different part of
> list_lru code including list_lru_add(), list_lru_del() and reparenting
> code. Upon further inspection, it looked like for a given object (dentry
> and inode), the super_block's list_lru didn't have list_lru_one for the
> memcg of that object. The initial suspicions were either the object is
> not allocated through kmem_cache_alloc_lru() or somehow
> memcg_list_lru_alloc() failed to allocate list_lru_one() for a memcg but
> returned success. No evidence were found for these cases.
>
> Looking more deeper, we started seeing situations where valid memcg's id
> is not present in mem_cgroup_idr and in some cases multiple valid memcgs
> have same id and mem_cgroup_idr is pointing to one of them. So, the most
> reasonable explanation is that these situations can happen due to race
> between multiple idr_remove() calls or race between
> idr_alloc()/idr_replace() and idr_remove(). These races are causing
> multiple memcgs to acquire the same ID and then offlining of one of them
> would cleanup list_lrus on the system for all of them. Later access from
> other memcgs to the list_lru cause crashes due to missing list_lru_one.
>
> Fixes: 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure after many small jobs")
> Signed-off-by: Shakeel Butt <shakeel.butt@...ux.dev>
Sorry for being late here. Thanks for catching this up.
Acked-by: Michal Hocko <mhocko@...e.com>
Cc: stable is definitely due here as those races are really nasty to
debug.
Thanks!
> ---
> mm/memcontrol.c | 22 ++++++++++++++++++++--
> 1 file changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index b889a7fbf382..8971d3473a7b 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3364,11 +3364,28 @@ static void memcg_wb_domain_size_changed(struct mem_cgroup *memcg)
>
> #define MEM_CGROUP_ID_MAX ((1UL << MEM_CGROUP_ID_SHIFT) - 1)
> static DEFINE_IDR(mem_cgroup_idr);
> +static DEFINE_SPINLOCK(memcg_idr_lock);
> +
> +static int mem_cgroup_alloc_id(void)
> +{
> + int ret;
> +
> + idr_preload(GFP_KERNEL);
> + spin_lock(&memcg_idr_lock);
> + ret = idr_alloc(&mem_cgroup_idr, NULL, 1, MEM_CGROUP_ID_MAX + 1,
> + GFP_NOWAIT);
> + spin_unlock(&memcg_idr_lock);
> + idr_preload_end();
> + return ret;
> +}
>
> static void mem_cgroup_id_remove(struct mem_cgroup *memcg)
> {
> if (memcg->id.id > 0) {
> + spin_lock(&memcg_idr_lock);
> idr_remove(&mem_cgroup_idr, memcg->id.id);
> + spin_unlock(&memcg_idr_lock);
> +
> memcg->id.id = 0;
> }
> }
> @@ -3502,8 +3519,7 @@ static struct mem_cgroup *mem_cgroup_alloc(struct mem_cgroup *parent)
> if (!memcg)
> return ERR_PTR(error);
>
> - memcg->id.id = idr_alloc(&mem_cgroup_idr, NULL,
> - 1, MEM_CGROUP_ID_MAX + 1, GFP_KERNEL);
> + memcg->id.id = mem_cgroup_alloc_id();
> if (memcg->id.id < 0) {
> error = memcg->id.id;
> goto fail;
> @@ -3648,7 +3664,9 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css)
> * publish it here at the end of onlining. This matches the
> * regular ID destruction during offlining.
> */
> + spin_lock(&memcg_idr_lock);
> idr_replace(&mem_cgroup_idr, memcg, memcg->id.id);
> + spin_unlock(&memcg_idr_lock);
>
> return 0;
> offline_kmem:
> --
> 2.43.5
--
Michal Hocko
SUSE Labs
Powered by blists - more mailing lists