linux-kernel - Re: [PATCH FIX v0] mm: memcg/slab: Uncharge during kmem_cache_free

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201012033326.GJ185637@in.ibm.com>
Date:   Mon, 12 Oct 2020 09:03:26 +0530
From:   Bharata B Rao <bharata@...ux.ibm.com>
To:     Roman Gushchin <guro@...com>
Cc:     linux-kernel@...r.kernel.org, cl@...ux.com, rientjes@...gle.com,
        iamjoonsoo.kim@....com, akpm@...ux-foundation.org,
        linux-mm@...ck.org
Subject: Re: [PATCH FIX v0] mm: memcg/slab: Uncharge during
 kmem_cache_free_bulk()

On Fri, Oct 09, 2020 at 11:45:51AM -0700, Roman Gushchin wrote:
> On Fri, Oct 09, 2020 at 11:34:23AM +0530, Bharata B Rao wrote:
> 
> Hi Bharata,
> 
> > Object cgroup charging is done for all the objects during
> > allocation, but during freeing, uncharging ends up happening
> > for only one object in the case of bulk allocation/freeing.
> 
> Yes, it's definitely a problem. Thank you for catching it!
> 
> I'm curious, did you find it in the wild or by looking into the code?

Found by looking at the code.

> 
> > 
> > Fix this by having a separate call to uncharge all the
> > objects from kmem_cache_free_bulk() and by modifying
> > memcg_slab_free_hook() to take care of bulk uncharging.
> >
> > Signed-off-by: Bharata B Rao <bharata@...ux.ibm.com>
> 
> Please, add:
> 
> Fixes: 964d4bd370d5 ("mm: memcg/slab: save obj_cgroup for non-root slab objects")
> Cc: stable@...r.kernel.org
> 
> > ---
> >  mm/slab.c |  2 +-
> >  mm/slab.h | 42 +++++++++++++++++++++++++++---------------
> >  mm/slub.c |  3 ++-
> >  3 files changed, 30 insertions(+), 17 deletions(-)
> > 
> > diff --git a/mm/slab.c b/mm/slab.c
> > index f658e86ec8cee..5c70600d8b1cc 100644
> > --- a/mm/slab.c
> > +++ b/mm/slab.c
> > @@ -3440,7 +3440,7 @@ void ___cache_free(struct kmem_cache *cachep, void *objp,
> >  		memset(objp, 0, cachep->object_size);
> >  	kmemleak_free_recursive(objp, cachep->flags);
> >  	objp = cache_free_debugcheck(cachep, objp, caller);
> > -	memcg_slab_free_hook(cachep, virt_to_head_page(objp), objp);
> > +	memcg_slab_free_hook(cachep, &objp, 1);
> >  
> >  	/*
> >  	 * Skip calling cache_free_alien() when the platform is not numa.
> > diff --git a/mm/slab.h b/mm/slab.h
> > index 6cc323f1313af..6dd4b702888a7 100644
> > --- a/mm/slab.h
> > +++ b/mm/slab.h
> > @@ -345,30 +345,42 @@ static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
> >  	obj_cgroup_put(objcg);
> >  }
> >  
> > -static inline void memcg_slab_free_hook(struct kmem_cache *s, struct page *page,
> > -					void *p)
> > +static inline void memcg_slab_free_hook(struct kmem_cache *s_orig,
> > +					void **p, int objects)
> >  {
> > +	struct kmem_cache *s;
> >  	struct obj_cgroup *objcg;
> > +	struct page *page;
> >  	unsigned int off;
> > +	int i;
> >  
> >  	if (!memcg_kmem_enabled())
> >  		return;
> >  
> > -	if (!page_has_obj_cgroups(page))
> > -		return;
> > +	for (i = 0; i < objects; i++) {
> > +		if (unlikely(!p[i]))
> > +			continue;
> >  
> > -	off = obj_to_index(s, page, p);
> > -	objcg = page_obj_cgroups(page)[off];
> > -	page_obj_cgroups(page)[off] = NULL;
> > +		page = virt_to_head_page(p[i]);
> > +		if (!page_has_obj_cgroups(page))
> > +			continue;
> >  
> > -	if (!objcg)
> > -		return;
> > +		if (!s_orig)
> > +			s = page->slab_cache;
> > +		else
> > +			s = s_orig;
> >  
> > -	obj_cgroup_uncharge(objcg, obj_full_size(s));
> > -	mod_objcg_state(objcg, page_pgdat(page), cache_vmstat_idx(s),
> > -			-obj_full_size(s));
> > +		off = obj_to_index(s, page, p[i]);
> > +		objcg = page_obj_cgroups(page)[off];
> > +		if (!objcg)
> > +			continue;
> >  
> > -	obj_cgroup_put(objcg);
> > +		page_obj_cgroups(page)[off] = NULL;
> > +		obj_cgroup_uncharge(objcg, obj_full_size(s));
> > +		mod_objcg_state(objcg, page_pgdat(page), cache_vmstat_idx(s),
> > +				-obj_full_size(s));
> > +		obj_cgroup_put(objcg);
> > +	}
> >  }
> >  
> >  #else /* CONFIG_MEMCG_KMEM */
> > @@ -406,8 +418,8 @@ static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
> >  {
> >  }
> >  
> > -static inline void memcg_slab_free_hook(struct kmem_cache *s, struct page *page,
> > -					void *p)
> > +static inline void memcg_slab_free_hook(struct kmem_cache *s,
> > +					void **p, int objects)
> >  {
> >  }
> >  #endif /* CONFIG_MEMCG_KMEM */
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 6d3574013b2f8..0cbe67f13946e 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -3091,7 +3091,7 @@ static __always_inline void do_slab_free(struct kmem_cache *s,
> >  	struct kmem_cache_cpu *c;
> >  	unsigned long tid;
> >  
> > -	memcg_slab_free_hook(s, page, head);
> > +	memcg_slab_free_hook(s, &head, 1);
> 
> Hm, I wonder if it's better to teach do_slab_free() to handle the (cnt > 1) case correctly?

Possible, but we will have to walk the detached freelist there while
here it is much easier to just walk the array of objects?

> 
> >  redo:
> >  	/*
> >  	 * Determine the currently cpus per cpu slab.
> > @@ -3253,6 +3253,7 @@ void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
> >  	if (WARN_ON(!size))
> >  		return;
> >  
> > +	memcg_slab_free_hook(s, p, size);
> 
> Then you don't need this change.
> 
> Otherwise memcg_slab_free_hook() can be called twice for the same object. It's ok from
> accounting correctness perspective, because the first call will zero out the objcg pointer,
> but still much better to be avoided.

Yes, for that one object there will be one additional uncharge call,
but as you note it is handled by the !objcg check.

Regards,
Bharata.