netdev - Re: [PATCH v4] memcg: add charging of already allocated slab objects

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5d48fe26-1e18-4941-99d4-8c03e83f5e76@redhat.com>
Date: Tue, 10 Sep 2024 10:26:36 +0200
From: Paolo Abeni <pabeni@...hat.com>
To: Shakeel Butt <shakeel.butt@...ux.dev>,
 Andrew Morton <akpm@...ux-foundation.org>, Vlastimil Babka <vbabka@...e.cz>
Cc: Johannes Weiner <hannes@...xchg.org>, Michal Hocko <mhocko@...nel.org>,
 Roman Gushchin <roman.gushchin@...ux.dev>,
 Muchun Song <muchun.song@...ux.dev>, David Rientjes <rientjes@...gle.com>,
 Hyeonggon Yoo <42.hyeyoo@...il.com>, Eric Dumazet <edumazet@...gle.com>,
 "David S . Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>,
 linux-mm@...ck.org, linux-kernel@...r.kernel.org,
 Meta kernel team <kernel-team@...a.com>, cgroups@...r.kernel.org,
 netdev@...r.kernel.org
Subject: Re: [PATCH v4] memcg: add charging of already allocated slab objects

On 9/5/24 19:34, Shakeel Butt wrote:
> At the moment, the slab objects are charged to the memcg at the
> allocation time. However there are cases where slab objects are
> allocated at the time where the right target memcg to charge it to is
> not known. One such case is the network sockets for the incoming
> connection which are allocated in the softirq context.
> 
> Couple hundred thousand connections are very normal on large loaded
> server and almost all of those sockets underlying those connections get
> allocated in the softirq context and thus not charged to any memcg.
> However later at the accept() time we know the right target memcg to
> charge. Let's add new API to charge already allocated objects, so we can
> have better accounting of the memory usage.
> 
> To measure the performance impact of this change, tcp_crr is used from
> the neper [1] performance suite. Basically it is a network ping pong
> test with new connection for each ping pong.
> 
> The server and the client are run inside 3 level of cgroup hierarchy
> using the following commands:
> 
> Server:
>   $ tcp_crr -6
> 
> Client:
>   $ tcp_crr -6 -c -H ${server_ip}
> 
> If the client and server run on different machines with 50 GBPS NIC,
> there is no visible impact of the change.
> 
> For the same machine experiment with v6.11-rc5 as base.
> 
>            base (throughput)     with-patch
> tcp_crr   14545 (+- 80)         14463 (+- 56)
> 
> It seems like the performance impact is within the noise.
> 
> Link: https://github.com/google/neper [1]
> Signed-off-by: Shakeel Butt <shakeel.butt@...ux.dev>
> Reviewed-by: Roman Gushchin <roman.gushchin@...ux.dev>
> ---
> v3: https://lore.kernel.org/all/20240829175339.2424521-1-shakeel.butt@linux.dev/
> Changes since v3:
> - Add kernel doc for kmem_cache_charge.
> 
> v2: https://lore.kernel.org/all/20240827235228.1591842-1-shakeel.butt@linux.dev/
> Change since v2:
> - Add handling of already charged large kmalloc objects.
> - Move the normal kmalloc cache check into a function.
> 
> v1: https://lore.kernel.org/all/20240826232908.4076417-1-shakeel.butt@linux.dev/
> Changes since v1:
> - Correctly handle large allocations which bypass slab
> - Rearrange code to avoid compilation errors for !CONFIG_MEMCG builds
> 
> RFC: https://lore.kernel.org/all/20240824010139.1293051-1-shakeel.butt@linux.dev/
> Changes since the RFC:
> - Added check for already charged slab objects.
> - Added performance results from neper's tcp_crr
> 
> 
>   include/linux/slab.h            | 20 ++++++++++++++
>   mm/slab.h                       |  7 +++++
>   mm/slub.c                       | 49 +++++++++++++++++++++++++++++++++
>   net/ipv4/inet_connection_sock.c |  5 ++--
>   4 files changed, 79 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index eb2bf4629157..68789c79a530 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -547,6 +547,26 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
>   			    gfp_t gfpflags) __assume_slab_alignment __malloc;
>   #define kmem_cache_alloc_lru(...)	alloc_hooks(kmem_cache_alloc_lru_noprof(__VA_ARGS__))
>   
> +/**
> + * kmem_cache_charge - memcg charge an already allocated slab memory
> + * @objp: address of the slab object to memcg charge.
> + * @gfpflags: describe the allocation context
> + *
> + * kmem_cache_charge is the normal method to charge a slab object to the current
> + * memcg. The objp should be pointer returned by the slab allocator functions
> + * like kmalloc or kmem_cache_alloc. The memcg charge behavior can be controller
> + * through gfpflags parameter.
> + *
> + * There are several cases where it will return true regardless. More
> + * specifically:
> + *
> + * 1. For !CONFIG_MEMCG or cgroup_disable=memory systems.
> + * 2. Already charged slab objects.
> + * 3. For slab objects from KMALLOC_NORMAL caches.
> + *
> + * Return: true if charge was successful otherwise false.
> + */
> +bool kmem_cache_charge(void *objp, gfp_t gfpflags);
>   void kmem_cache_free(struct kmem_cache *s, void *objp);
>   
>   kmem_buckets *kmem_buckets_create(const char *name, slab_flags_t flags,
> diff --git a/mm/slab.h b/mm/slab.h
> index dcdb56b8e7f5..9f907e930609 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -443,6 +443,13 @@ static inline bool is_kmalloc_cache(struct kmem_cache *s)
>   	return (s->flags & SLAB_KMALLOC);
>   }
>   
> +static inline bool is_kmalloc_normal(struct kmem_cache *s)
> +{
> +	if (!is_kmalloc_cache(s))
> +		return false;
> +	return !(s->flags & (SLAB_CACHE_DMA|SLAB_ACCOUNT|SLAB_RECLAIM_ACCOUNT));
> +}
> +
>   /* Legal flag mask for kmem_cache_create(), for various configurations */
>   #define SLAB_CORE_FLAGS (SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA | \
>   			 SLAB_CACHE_DMA32 | SLAB_PANIC | \
> diff --git a/mm/slub.c b/mm/slub.c
> index c9d8a2497fd6..3f2a89f7a23a 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2185,6 +2185,41 @@ void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
>   
>   	__memcg_slab_free_hook(s, slab, p, objects, obj_exts);
>   }
> +
> +static __fastpath_inline
> +bool memcg_slab_post_charge(void *p, gfp_t flags)
> +{
> +	struct slabobj_ext *slab_exts;
> +	struct kmem_cache *s;
> +	struct folio *folio;
> +	struct slab *slab;
> +	unsigned long off;
> +
> +	folio = virt_to_folio(p);
> +	if (!folio_test_slab(folio)) {
> +		return folio_memcg_kmem(folio) ||
> +			(__memcg_kmem_charge_page(folio_page(folio, 0), flags,
> +						  folio_order(folio)) == 0);
> +	}
> +
> +	slab = folio_slab(folio);
> +	s = slab->slab_cache;
> +
> +	/* Ignore KMALLOC_NORMAL cache to avoid circular dependency. */
> +	if (is_kmalloc_normal(s))
> +		return true;
> +
> +	/* Ignore already charged objects. */
> +	slab_exts = slab_obj_exts(slab);
> +	if (slab_exts) {
> +		off = obj_to_index(s, slab, p);
> +		if (unlikely(slab_exts[off].objcg))
> +			return true;
> +	}
> +
> +	return __memcg_slab_post_alloc_hook(s, NULL, flags, 1, &p);
> +}
> +
>   #else /* CONFIG_MEMCG */
>   static inline bool memcg_slab_post_alloc_hook(struct kmem_cache *s,
>   					      struct list_lru *lru,
> @@ -2198,6 +2233,11 @@ static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
>   					void **p, int objects)
>   {
>   }
> +
> +static inline bool memcg_slab_post_charge(void *p, gfp_t flags)
> +{
> +	return true;
> +}
>   #endif /* CONFIG_MEMCG */
>   
>   /*
> @@ -4062,6 +4102,15 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
>   }
>   EXPORT_SYMBOL(kmem_cache_alloc_lru_noprof);
>   
> +bool kmem_cache_charge(void *objp, gfp_t gfpflags)
> +{
> +	if (!memcg_kmem_online())
> +		return true;
> +
> +	return memcg_slab_post_charge(objp, gfpflags);
> +}
> +EXPORT_SYMBOL(kmem_cache_charge);
> +
>   /**
>    * kmem_cache_alloc_node - Allocate an object on the specified node
>    * @s: The cache to allocate from.
> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index 64d07b842e73..3c13ca8c11fb 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -715,6 +715,7 @@ struct sock *inet_csk_accept(struct sock *sk, struct proto_accept_arg *arg)
>   	release_sock(sk);
>   	if (newsk && mem_cgroup_sockets_enabled) {
>   		int amt = 0;
> +		gfp_t gfp = GFP_KERNEL | __GFP_NOFAIL;
>   
>   		/* atomically get the memory usage, set and charge the
>   		 * newsk->sk_memcg.
> @@ -731,8 +732,8 @@ struct sock *inet_csk_accept(struct sock *sk, struct proto_accept_arg *arg)
>   		}
>   
>   		if (amt)
> -			mem_cgroup_charge_skmem(newsk->sk_memcg, amt,
> -						GFP_KERNEL | __GFP_NOFAIL);
> +			mem_cgroup_charge_skmem(newsk->sk_memcg, amt, gfp);
> +		kmem_cache_charge(newsk, gfp);
>   
>   		release_sock(newsk);
>   	}

The networking bits looks sane to me - with a very minor nit about the 
reverse xmas tree order in variables declaration above.

Acked-by: Paolo Abeni <pabeni@...hat.com>