linux-kernel - Re: [PATCH v1] memcg: add charging of already allocated slab objects

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <EA5F7851-B519-4570-B299-8A096A09D6E7@linux.dev>
Date: Wed, 28 Aug 2024 10:36:06 +0800
From: Muchun Song <muchun.song@...ux.dev>
To: Shakeel Butt <shakeel.butt@...ux.dev>
Cc: Roman Gushchin <roman.gushchin@...ux.dev>,
 Andrew Morton <akpm@...ux-foundation.org>,
 Johannes Weiner <hannes@...xchg.org>,
 Michal Hocko <mhocko@...nel.org>,
 Vlastimil Babka <vbabka@...e.cz>,
 David Rientjes <rientjes@...gle.com>,
 Hyeonggon Yoo <42.hyeyoo@...il.com>,
 Eric Dumazet <edumazet@...gle.com>,
 "David S . Miller" <davem@...emloft.net>,
 Jakub Kicinski <kuba@...nel.org>,
 Paolo Abeni <pabeni@...hat.com>,
 Linux Memory Management List <linux-mm@...ck.org>,
 LKML <linux-kernel@...r.kernel.org>,
 Meta kernel team <kernel-team@...a.com>,
 cgroups@...r.kernel.org,
 netdev <netdev@...r.kernel.org>
Subject: Re: [PATCH v1] memcg: add charging of already allocated slab objects



> On Aug 28, 2024, at 01:23, Shakeel Butt <shakeel.butt@...ux.dev> wrote:
> 
> On Tue, Aug 27, 2024 at 03:06:32AM GMT, Roman Gushchin wrote:
>> On Mon, Aug 26, 2024 at 04:29:08PM -0700, Shakeel Butt wrote:
>>> At the moment, the slab objects are charged to the memcg at the
>>> allocation time. However there are cases where slab objects are
>>> allocated at the time where the right target memcg to charge it to is
>>> not known. One such case is the network sockets for the incoming
>>> connection which are allocated in the softirq context.
>>> 
>>> Couple hundred thousand connections are very normal on large loaded
>>> server and almost all of those sockets underlying those connections get
>>> allocated in the softirq context and thus not charged to any memcg.
>>> However later at the accept() time we know the right target memcg to
>>> charge. Let's add new API to charge already allocated objects, so we can
>>> have better accounting of the memory usage.
>>> 
>>> To measure the performance impact of this change, tcp_crr is used from
>>> the neper [1] performance suite. Basically it is a network ping pong
>>> test with new connection for each ping pong.
>>> 
>>> The server and the client are run inside 3 level of cgroup hierarchy
>>> using the following commands:
>>> 
>>> Server:
>>> $ tcp_crr -6
>>> 
>>> Client:
>>> $ tcp_crr -6 -c -H ${server_ip}
>>> 
>>> If the client and server run on different machines with 50 GBPS NIC,
>>> there is no visible impact of the change.
>>> 
>>> For the same machine experiment with v6.11-rc5 as base.
>>> 
>>>         base (throughput)     with-patch
>>> tcp_crr   14545 (+- 80)         14463 (+- 56)
>>> 
>>> It seems like the performance impact is within the noise.
>>> 
>>> Link: https://github.com/google/neper [1]
>>> Signed-off-by: Shakeel Butt <shakeel.butt@...ux.dev>
>> 
>> Hi Shakeel,
>> 
>> I like the idea and performance numbers look good. However some comments on
>> the implementation:
>> 
> 
> Thanks for taking a look.
> 
>>> ---
>>> 
>>> Changes since the RFC:
>>> - Added check for already charged slab objects.
>>> - Added performance results from neper's tcp_crr
>>> 
>>> include/linux/slab.h            |  1 +
>>> mm/slub.c                       | 54 +++++++++++++++++++++++++++++++++
>>> net/ipv4/inet_connection_sock.c |  5 +--
>>> 3 files changed, 58 insertions(+), 2 deletions(-)
>>> 
>>> diff --git a/include/linux/slab.h b/include/linux/slab.h
>>> index eb2bf4629157..05cfab107c72 100644
>>> --- a/include/linux/slab.h
>>> +++ b/include/linux/slab.h
>>> @@ -547,6 +547,7 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
>>>    gfp_t gfpflags) __assume_slab_alignment __malloc;
>>> #define kmem_cache_alloc_lru(...) alloc_hooks(kmem_cache_alloc_lru_noprof(__VA_ARGS__))
>>> 
>>> +bool kmem_cache_charge(void *objp, gfp_t gfpflags);
>>> void kmem_cache_free(struct kmem_cache *s, void *objp);
>>> 
>>> kmem_buckets *kmem_buckets_create(const char *name, slab_flags_t flags,
>>> diff --git a/mm/slub.c b/mm/slub.c
>>> index c9d8a2497fd6..580683597b5c 100644
>>> --- a/mm/slub.c
>>> +++ b/mm/slub.c
>>> @@ -2185,6 +2185,16 @@ void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
>>> 
>>> __memcg_slab_free_hook(s, slab, p, objects, obj_exts);
>>> }
>>> +
>>> +static __fastpath_inline
>>> +bool memcg_slab_post_charge(struct kmem_cache *s, void *p, gfp_t flags)
>>> +{
>>> + if (likely(!memcg_kmem_online()))
>>> + return true;
>> 
>> We do have this check in kmem_cache_charge(), why do we need to check it again?
>> 
> 
> I missed to remove this one. I am going to rearrange the code bit more
> in these functions to avoid the build errors in non MEMCG builds.
> 
>>> +
>>> + return __memcg_slab_post_alloc_hook(s, NULL, flags, 1, &p);
>>> +}
>>> +
>>> #else /* CONFIG_MEMCG */
>>> static inline bool memcg_slab_post_alloc_hook(struct kmem_cache *s,
>>>      struct list_lru *lru,
>>> @@ -2198,6 +2208,13 @@ static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
>>> void **p, int objects)
>>> {
>>> }
>>> +
>>> +static inline bool memcg_slab_post_charge(struct kmem_cache *s,
>>> +   void *p,
>>> +   gfp_t flags)
>>> +{
>>> + return true;
>>> +}
>>> #endif /* CONFIG_MEMCG */
>>> 
>>> /*
>>> @@ -4062,6 +4079,43 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
>>> }
>>> EXPORT_SYMBOL(kmem_cache_alloc_lru_noprof);
>>> 
>>> +#define KMALLOC_TYPE (SLAB_KMALLOC | SLAB_CACHE_DMA | \
>>> +       SLAB_ACCOUNT | SLAB_RECLAIM_ACCOUNT)
>>> +
>>> +bool kmem_cache_charge(void *objp, gfp_t gfpflags)
>>> +{
>>> + struct slabobj_ext *slab_exts;
>>> + struct kmem_cache *s;
>>> + struct folio *folio;
>>> + struct slab *slab;
>>> + unsigned long off;
>>> +
>>> + if (!memcg_kmem_online())
>>> + return true;
>>> +
>>> + folio = virt_to_folio(objp);
>>> + if (unlikely(!folio_test_slab(folio)))
>>> + return false;
>> 
>> Does it handle the case of a too-big-to-be-a-slab-object allocation?
>> I think it's better to handle it properly. Also, why return false here?
>> 
> 
> Yes I will fix the too-big-to-be-a-slab-object allocations. I presume I
> should just follow the kfree() hanlding on !folio_test_slab() i.e. that
> the given object is the large or too-big-to-be-a-slab-object.

Hi Shakeel,

If we decide to do this, I suppose you will use memcg_kmem_charge_page
to charge big-object. To be consistent, I suggest renaming kmem_cache_charge
to memcg_kmem_charge to handle both slab object and big-object. And I saw
all the functions related to object charging is moved to memcontrol.c (e.g.
__memcg_slab_post_alloc_hook), so maybe we should also do this for
memcg_kmem_charge?

Muhcun,
Thanks.