[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zs1CuLa-SE88jRVx@google.com>
Date: Tue, 27 Aug 2024 03:06:32 +0000
From: Roman Gushchin <roman.gushchin@...ux.dev>
To: Shakeel Butt <shakeel.butt@...ux.dev>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Johannes Weiner <hannes@...xchg.org>,
Michal Hocko <mhocko@...nel.org>,
Muchun Song <muchun.song@...ux.dev>,
Vlastimil Babka <vbabka@...e.cz>,
David Rientjes <rientjes@...gle.com>,
Hyeonggon Yoo <42.hyeyoo@...il.com>,
Eric Dumazet <edumazet@...gle.com>,
"David S . Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Meta kernel team <kernel-team@...a.com>, cgroups@...r.kernel.org,
netdev@...r.kernel.org
Subject: Re: [PATCH v1] memcg: add charging of already allocated slab objects
On Mon, Aug 26, 2024 at 04:29:08PM -0700, Shakeel Butt wrote:
> At the moment, the slab objects are charged to the memcg at the
> allocation time. However there are cases where slab objects are
> allocated at the time where the right target memcg to charge it to is
> not known. One such case is the network sockets for the incoming
> connection which are allocated in the softirq context.
>
> Couple hundred thousand connections are very normal on large loaded
> server and almost all of those sockets underlying those connections get
> allocated in the softirq context and thus not charged to any memcg.
> However later at the accept() time we know the right target memcg to
> charge. Let's add new API to charge already allocated objects, so we can
> have better accounting of the memory usage.
>
> To measure the performance impact of this change, tcp_crr is used from
> the neper [1] performance suite. Basically it is a network ping pong
> test with new connection for each ping pong.
>
> The server and the client are run inside 3 level of cgroup hierarchy
> using the following commands:
>
> Server:
> $ tcp_crr -6
>
> Client:
> $ tcp_crr -6 -c -H ${server_ip}
>
> If the client and server run on different machines with 50 GBPS NIC,
> there is no visible impact of the change.
>
> For the same machine experiment with v6.11-rc5 as base.
>
> base (throughput) with-patch
> tcp_crr 14545 (+- 80) 14463 (+- 56)
>
> It seems like the performance impact is within the noise.
>
> Link: https://github.com/google/neper [1]
> Signed-off-by: Shakeel Butt <shakeel.butt@...ux.dev>
Hi Shakeel,
I like the idea and performance numbers look good. However some comments on
the implementation:
> ---
>
> Changes since the RFC:
> - Added check for already charged slab objects.
> - Added performance results from neper's tcp_crr
>
> include/linux/slab.h | 1 +
> mm/slub.c | 54 +++++++++++++++++++++++++++++++++
> net/ipv4/inet_connection_sock.c | 5 +--
> 3 files changed, 58 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index eb2bf4629157..05cfab107c72 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -547,6 +547,7 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
> gfp_t gfpflags) __assume_slab_alignment __malloc;
> #define kmem_cache_alloc_lru(...) alloc_hooks(kmem_cache_alloc_lru_noprof(__VA_ARGS__))
>
> +bool kmem_cache_charge(void *objp, gfp_t gfpflags);
> void kmem_cache_free(struct kmem_cache *s, void *objp);
>
> kmem_buckets *kmem_buckets_create(const char *name, slab_flags_t flags,
> diff --git a/mm/slub.c b/mm/slub.c
> index c9d8a2497fd6..580683597b5c 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2185,6 +2185,16 @@ void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
>
> __memcg_slab_free_hook(s, slab, p, objects, obj_exts);
> }
> +
> +static __fastpath_inline
> +bool memcg_slab_post_charge(struct kmem_cache *s, void *p, gfp_t flags)
> +{
> + if (likely(!memcg_kmem_online()))
> + return true;
We do have this check in kmem_cache_charge(), why do we need to check it again?
> +
> + return __memcg_slab_post_alloc_hook(s, NULL, flags, 1, &p);
> +}
> +
> #else /* CONFIG_MEMCG */
> static inline bool memcg_slab_post_alloc_hook(struct kmem_cache *s,
> struct list_lru *lru,
> @@ -2198,6 +2208,13 @@ static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
> void **p, int objects)
> {
> }
> +
> +static inline bool memcg_slab_post_charge(struct kmem_cache *s,
> + void *p,
> + gfp_t flags)
> +{
> + return true;
> +}
> #endif /* CONFIG_MEMCG */
>
> /*
> @@ -4062,6 +4079,43 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
> }
> EXPORT_SYMBOL(kmem_cache_alloc_lru_noprof);
>
> +#define KMALLOC_TYPE (SLAB_KMALLOC | SLAB_CACHE_DMA | \
> + SLAB_ACCOUNT | SLAB_RECLAIM_ACCOUNT)
> +
> +bool kmem_cache_charge(void *objp, gfp_t gfpflags)
> +{
> + struct slabobj_ext *slab_exts;
> + struct kmem_cache *s;
> + struct folio *folio;
> + struct slab *slab;
> + unsigned long off;
> +
> + if (!memcg_kmem_online())
> + return true;
> +
> + folio = virt_to_folio(objp);
> + if (unlikely(!folio_test_slab(folio)))
> + return false;
Does it handle the case of a too-big-to-be-a-slab-object allocation?
I think it's better to handle it properly. Also, why return false here?
> +
> + slab = folio_slab(folio);
> + s = slab->slab_cache;
> +
> + /* Ignore KMALLOC_NORMAL cache to avoid circular dependency. */
> + if ((s->flags & KMALLOC_TYPE) == SLAB_KMALLOC)
> + return true;
And true here? It seems to be a bit inconsistent.
Also, if we have this check here, it means your function won't handle kmallocs
at all? Because !KMALLOC_NORMAL allocations won't get here.
> +
> + /* Ignore already charged objects. */
> + slab_exts = slab_obj_exts(slab);
> + if (slab_exts) {
> + off = obj_to_index(s, slab, objp);
> + if (unlikely(slab_exts[off].objcg))
> + return true;
> + }
> +
> + return memcg_slab_post_charge(s, objp, gfpflags);
> +}
> +EXPORT_SYMBOL(kmem_cache_charge);
> +
> /**
> * kmem_cache_alloc_node - Allocate an object on the specified node
> * @s: The cache to allocate from.
> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index 64d07b842e73..3c13ca8c11fb 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -715,6 +715,7 @@ struct sock *inet_csk_accept(struct sock *sk, struct proto_accept_arg *arg)
> release_sock(sk);
> if (newsk && mem_cgroup_sockets_enabled) {
> int amt = 0;
> + gfp_t gfp = GFP_KERNEL | __GFP_NOFAIL;
>
> /* atomically get the memory usage, set and charge the
> * newsk->sk_memcg.
> @@ -731,8 +732,8 @@ struct sock *inet_csk_accept(struct sock *sk, struct proto_accept_arg *arg)
> }
>
> if (amt)
> - mem_cgroup_charge_skmem(newsk->sk_memcg, amt,
> - GFP_KERNEL | __GFP_NOFAIL);
> + mem_cgroup_charge_skmem(newsk->sk_memcg, amt, gfp);
> + kmem_cache_charge(newsk, gfp);
Wait, so we assume that newsk->sk_memcg === current memcg? Or we're ok with them being
different?
Thanks!
Powered by blists - more mailing lists