linux-kernel - Re: [PATCH v1] memcg: add charging of already allocated slab objects

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <yiyx4fh6dklqpexfstkzp3gf23hjpbjujci2o6gs7nb4sutzvb@b5korjrjio3m>
Date: Tue, 27 Aug 2024 10:23:18 -0700
From: Shakeel Butt <shakeel.butt@...ux.dev>
To: Roman Gushchin <roman.gushchin@...ux.dev>
Cc: Andrew Morton <akpm@...ux-foundation.org>, 
	Johannes Weiner <hannes@...xchg.org>, Michal Hocko <mhocko@...nel.org>, 
	Muchun Song <muchun.song@...ux.dev>, Vlastimil Babka <vbabka@...e.cz>, 
	David Rientjes <rientjes@...gle.com>, Hyeonggon Yoo <42.hyeyoo@...il.com>, 
	Eric Dumazet <edumazet@...gle.com>, "David S . Miller" <davem@...emloft.net>, 
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org, Meta kernel team <kernel-team@...a.com>, cgroups@...r.kernel.org, 
	netdev@...r.kernel.org
Subject: Re: [PATCH v1] memcg: add charging of already allocated slab objects

On Tue, Aug 27, 2024 at 03:06:32AM GMT, Roman Gushchin wrote:
> On Mon, Aug 26, 2024 at 04:29:08PM -0700, Shakeel Butt wrote:
> > At the moment, the slab objects are charged to the memcg at the
> > allocation time. However there are cases where slab objects are
> > allocated at the time where the right target memcg to charge it to is
> > not known. One such case is the network sockets for the incoming
> > connection which are allocated in the softirq context.
> > 
> > Couple hundred thousand connections are very normal on large loaded
> > server and almost all of those sockets underlying those connections get
> > allocated in the softirq context and thus not charged to any memcg.
> > However later at the accept() time we know the right target memcg to
> > charge. Let's add new API to charge already allocated objects, so we can
> > have better accounting of the memory usage.
> > 
> > To measure the performance impact of this change, tcp_crr is used from
> > the neper [1] performance suite. Basically it is a network ping pong
> > test with new connection for each ping pong.
> > 
> > The server and the client are run inside 3 level of cgroup hierarchy
> > using the following commands:
> > 
> > Server:
> >  $ tcp_crr -6
> > 
> > Client:
> >  $ tcp_crr -6 -c -H ${server_ip}
> > 
> > If the client and server run on different machines with 50 GBPS NIC,
> > there is no visible impact of the change.
> > 
> > For the same machine experiment with v6.11-rc5 as base.
> > 
> >           base (throughput)     with-patch
> > tcp_crr   14545 (+- 80)         14463 (+- 56)
> > 
> > It seems like the performance impact is within the noise.
> > 
> > Link: https://github.com/google/neper [1]
> > Signed-off-by: Shakeel Butt <shakeel.butt@...ux.dev>
> 
> Hi Shakeel,
> 
> I like the idea and performance numbers look good. However some comments on
> the implementation:
> 

Thanks for taking a look.

> > ---
> > 
> > Changes since the RFC:
> > - Added check for already charged slab objects.
> > - Added performance results from neper's tcp_crr
> > 
> >  include/linux/slab.h            |  1 +
> >  mm/slub.c                       | 54 +++++++++++++++++++++++++++++++++
> >  net/ipv4/inet_connection_sock.c |  5 +--
> >  3 files changed, 58 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/linux/slab.h b/include/linux/slab.h
> > index eb2bf4629157..05cfab107c72 100644
> > --- a/include/linux/slab.h
> > +++ b/include/linux/slab.h
> > @@ -547,6 +547,7 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
> >  			    gfp_t gfpflags) __assume_slab_alignment __malloc;
> >  #define kmem_cache_alloc_lru(...)	alloc_hooks(kmem_cache_alloc_lru_noprof(__VA_ARGS__))
> >  
> > +bool kmem_cache_charge(void *objp, gfp_t gfpflags);
> >  void kmem_cache_free(struct kmem_cache *s, void *objp);
> >  
> >  kmem_buckets *kmem_buckets_create(const char *name, slab_flags_t flags,
> > diff --git a/mm/slub.c b/mm/slub.c
> > index c9d8a2497fd6..580683597b5c 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -2185,6 +2185,16 @@ void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
> >  
> >  	__memcg_slab_free_hook(s, slab, p, objects, obj_exts);
> >  }
> > +
> > +static __fastpath_inline
> > +bool memcg_slab_post_charge(struct kmem_cache *s, void *p, gfp_t flags)
> > +{
> > +	if (likely(!memcg_kmem_online()))
> > +		return true;
> 
> We do have this check in kmem_cache_charge(), why do we need to check it again?
> 

I missed to remove this one. I am going to rearrange the code bit more
in these functions to avoid the build errors in non MEMCG builds.

> > +
> > +	return __memcg_slab_post_alloc_hook(s, NULL, flags, 1, &p);
> > +}
> > +
> >  #else /* CONFIG_MEMCG */
> >  static inline bool memcg_slab_post_alloc_hook(struct kmem_cache *s,
> >  					      struct list_lru *lru,
> > @@ -2198,6 +2208,13 @@ static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
> >  					void **p, int objects)
> >  {
> >  }
> > +
> > +static inline bool memcg_slab_post_charge(struct kmem_cache *s,
> > +					  void *p,
> > +					  gfp_t flags)
> > +{
> > +	return true;
> > +}
> >  #endif /* CONFIG_MEMCG */
> >  
> >  /*
> > @@ -4062,6 +4079,43 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
> >  }
> >  EXPORT_SYMBOL(kmem_cache_alloc_lru_noprof);
> >  
> > +#define KMALLOC_TYPE (SLAB_KMALLOC | SLAB_CACHE_DMA | \
> > +		      SLAB_ACCOUNT | SLAB_RECLAIM_ACCOUNT)
> > +
> > +bool kmem_cache_charge(void *objp, gfp_t gfpflags)
> > +{
> > +	struct slabobj_ext *slab_exts;
> > +	struct kmem_cache *s;
> > +	struct folio *folio;
> > +	struct slab *slab;
> > +	unsigned long off;
> > +
> > +	if (!memcg_kmem_online())
> > +		return true;
> > +
> > +	folio = virt_to_folio(objp);
> > +	if (unlikely(!folio_test_slab(folio)))
> > +		return false;
> 
> Does it handle the case of a too-big-to-be-a-slab-object allocation?
> I think it's better to handle it properly. Also, why return false here?
> 

Yes I will fix the too-big-to-be-a-slab-object allocations. I presume I
should just follow the kfree() hanlding on !folio_test_slab() i.e. that
the given object is the large or too-big-to-be-a-slab-object.

> > +
> > +	slab = folio_slab(folio);
> > +	s = slab->slab_cache;
> > +
> > +	/* Ignore KMALLOC_NORMAL cache to avoid circular dependency. */
> > +	if ((s->flags & KMALLOC_TYPE) == SLAB_KMALLOC)
> > +		return true;
> 
> And true here? It seems to be a bit inconsistent.

Will be consistent after handling of the too-big-to-be-a-slab-object.

> Also, if we have this check here, it means your function won't handle kmallocs
> at all? Because !KMALLOC_NORMAL allocations won't get here.

The non-KMALLOC_NORMAL kmalloc caches should also have one of
SLAB_CACHE_DMA, SLAB_ACCOUNT and SLAB_RECLAIM_ACCOUNT flag, so the above
check will only be true for KMALLOC_NORMAL caches.

> 
> > +
> > +	/* Ignore already charged objects. */
> > +	slab_exts = slab_obj_exts(slab);
> > +	if (slab_exts) {
> > +		off = obj_to_index(s, slab, objp);
> > +		if (unlikely(slab_exts[off].objcg))
> > +			return true;
> > +	}
> > +
> > +	return memcg_slab_post_charge(s, objp, gfpflags);
> > +}
> > +EXPORT_SYMBOL(kmem_cache_charge);
> > +
> >  /**
> >   * kmem_cache_alloc_node - Allocate an object on the specified node
> >   * @s: The cache to allocate from.
> > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> > index 64d07b842e73..3c13ca8c11fb 100644
> > --- a/net/ipv4/inet_connection_sock.c
> > +++ b/net/ipv4/inet_connection_sock.c
> > @@ -715,6 +715,7 @@ struct sock *inet_csk_accept(struct sock *sk, struct proto_accept_arg *arg)
> >  	release_sock(sk);
> >  	if (newsk && mem_cgroup_sockets_enabled) {
> >  		int amt = 0;
> > +		gfp_t gfp = GFP_KERNEL | __GFP_NOFAIL;
> >  
> >  		/* atomically get the memory usage, set and charge the
> >  		 * newsk->sk_memcg.
> > @@ -731,8 +732,8 @@ struct sock *inet_csk_accept(struct sock *sk, struct proto_accept_arg *arg)
> >  		}
> >  
> >  		if (amt)
> > -			mem_cgroup_charge_skmem(newsk->sk_memcg, amt,
> > -						GFP_KERNEL | __GFP_NOFAIL);
> > +			mem_cgroup_charge_skmem(newsk->sk_memcg, amt, gfp);
> > +		kmem_cache_charge(newsk, gfp);
> 
> Wait, so we assume that newsk->sk_memcg === current memcg? Or we're ok with them being
> different?

We set newsk->sk_memcg in the same function (see call to
mem_cgroup_sk_alloc(newsk) couple of lines above). So, the
newsk->sk_memcg will be equal to the current memcg.

Thanks a lot of valuable feedback.
Shakeel