[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <EA5F7851-B519-4570-B299-8A096A09D6E7@linux.dev>
Date: Wed, 28 Aug 2024 10:36:06 +0800
From: Muchun Song <muchun.song@...ux.dev>
To: Shakeel Butt <shakeel.butt@...ux.dev>
Cc: Roman Gushchin <roman.gushchin@...ux.dev>,
Andrew Morton <akpm@...ux-foundation.org>,
Johannes Weiner <hannes@...xchg.org>,
Michal Hocko <mhocko@...nel.org>,
Vlastimil Babka <vbabka@...e.cz>,
David Rientjes <rientjes@...gle.com>,
Hyeonggon Yoo <42.hyeyoo@...il.com>,
Eric Dumazet <edumazet@...gle.com>,
"David S . Miller" <davem@...emloft.net>,
Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>,
Linux Memory Management List <linux-mm@...ck.org>,
LKML <linux-kernel@...r.kernel.org>,
Meta kernel team <kernel-team@...a.com>,
cgroups@...r.kernel.org,
netdev <netdev@...r.kernel.org>
Subject: Re: [PATCH v1] memcg: add charging of already allocated slab objects
> On Aug 28, 2024, at 01:23, Shakeel Butt <shakeel.butt@...ux.dev> wrote:
>
> On Tue, Aug 27, 2024 at 03:06:32AM GMT, Roman Gushchin wrote:
>> On Mon, Aug 26, 2024 at 04:29:08PM -0700, Shakeel Butt wrote:
>>> At the moment, the slab objects are charged to the memcg at the
>>> allocation time. However there are cases where slab objects are
>>> allocated at the time where the right target memcg to charge it to is
>>> not known. One such case is the network sockets for the incoming
>>> connection which are allocated in the softirq context.
>>>
>>> Couple hundred thousand connections are very normal on large loaded
>>> server and almost all of those sockets underlying those connections get
>>> allocated in the softirq context and thus not charged to any memcg.
>>> However later at the accept() time we know the right target memcg to
>>> charge. Let's add new API to charge already allocated objects, so we can
>>> have better accounting of the memory usage.
>>>
>>> To measure the performance impact of this change, tcp_crr is used from
>>> the neper [1] performance suite. Basically it is a network ping pong
>>> test with new connection for each ping pong.
>>>
>>> The server and the client are run inside 3 level of cgroup hierarchy
>>> using the following commands:
>>>
>>> Server:
>>> $ tcp_crr -6
>>>
>>> Client:
>>> $ tcp_crr -6 -c -H ${server_ip}
>>>
>>> If the client and server run on different machines with 50 GBPS NIC,
>>> there is no visible impact of the change.
>>>
>>> For the same machine experiment with v6.11-rc5 as base.
>>>
>>> base (throughput) with-patch
>>> tcp_crr 14545 (+- 80) 14463 (+- 56)
>>>
>>> It seems like the performance impact is within the noise.
>>>
>>> Link: https://github.com/google/neper [1]
>>> Signed-off-by: Shakeel Butt <shakeel.butt@...ux.dev>
>>
>> Hi Shakeel,
>>
>> I like the idea and performance numbers look good. However some comments on
>> the implementation:
>>
>
> Thanks for taking a look.
>
>>> ---
>>>
>>> Changes since the RFC:
>>> - Added check for already charged slab objects.
>>> - Added performance results from neper's tcp_crr
>>>
>>> include/linux/slab.h | 1 +
>>> mm/slub.c | 54 +++++++++++++++++++++++++++++++++
>>> net/ipv4/inet_connection_sock.c | 5 +--
>>> 3 files changed, 58 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/include/linux/slab.h b/include/linux/slab.h
>>> index eb2bf4629157..05cfab107c72 100644
>>> --- a/include/linux/slab.h
>>> +++ b/include/linux/slab.h
>>> @@ -547,6 +547,7 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
>>> gfp_t gfpflags) __assume_slab_alignment __malloc;
>>> #define kmem_cache_alloc_lru(...) alloc_hooks(kmem_cache_alloc_lru_noprof(__VA_ARGS__))
>>>
>>> +bool kmem_cache_charge(void *objp, gfp_t gfpflags);
>>> void kmem_cache_free(struct kmem_cache *s, void *objp);
>>>
>>> kmem_buckets *kmem_buckets_create(const char *name, slab_flags_t flags,
>>> diff --git a/mm/slub.c b/mm/slub.c
>>> index c9d8a2497fd6..580683597b5c 100644
>>> --- a/mm/slub.c
>>> +++ b/mm/slub.c
>>> @@ -2185,6 +2185,16 @@ void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
>>>
>>> __memcg_slab_free_hook(s, slab, p, objects, obj_exts);
>>> }
>>> +
>>> +static __fastpath_inline
>>> +bool memcg_slab_post_charge(struct kmem_cache *s, void *p, gfp_t flags)
>>> +{
>>> + if (likely(!memcg_kmem_online()))
>>> + return true;
>>
>> We do have this check in kmem_cache_charge(), why do we need to check it again?
>>
>
> I missed to remove this one. I am going to rearrange the code bit more
> in these functions to avoid the build errors in non MEMCG builds.
>
>>> +
>>> + return __memcg_slab_post_alloc_hook(s, NULL, flags, 1, &p);
>>> +}
>>> +
>>> #else /* CONFIG_MEMCG */
>>> static inline bool memcg_slab_post_alloc_hook(struct kmem_cache *s,
>>> struct list_lru *lru,
>>> @@ -2198,6 +2208,13 @@ static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab,
>>> void **p, int objects)
>>> {
>>> }
>>> +
>>> +static inline bool memcg_slab_post_charge(struct kmem_cache *s,
>>> + void *p,
>>> + gfp_t flags)
>>> +{
>>> + return true;
>>> +}
>>> #endif /* CONFIG_MEMCG */
>>>
>>> /*
>>> @@ -4062,6 +4079,43 @@ void *kmem_cache_alloc_lru_noprof(struct kmem_cache *s, struct list_lru *lru,
>>> }
>>> EXPORT_SYMBOL(kmem_cache_alloc_lru_noprof);
>>>
>>> +#define KMALLOC_TYPE (SLAB_KMALLOC | SLAB_CACHE_DMA | \
>>> + SLAB_ACCOUNT | SLAB_RECLAIM_ACCOUNT)
>>> +
>>> +bool kmem_cache_charge(void *objp, gfp_t gfpflags)
>>> +{
>>> + struct slabobj_ext *slab_exts;
>>> + struct kmem_cache *s;
>>> + struct folio *folio;
>>> + struct slab *slab;
>>> + unsigned long off;
>>> +
>>> + if (!memcg_kmem_online())
>>> + return true;
>>> +
>>> + folio = virt_to_folio(objp);
>>> + if (unlikely(!folio_test_slab(folio)))
>>> + return false;
>>
>> Does it handle the case of a too-big-to-be-a-slab-object allocation?
>> I think it's better to handle it properly. Also, why return false here?
>>
>
> Yes I will fix the too-big-to-be-a-slab-object allocations. I presume I
> should just follow the kfree() hanlding on !folio_test_slab() i.e. that
> the given object is the large or too-big-to-be-a-slab-object.
Hi Shakeel,
If we decide to do this, I suppose you will use memcg_kmem_charge_page
to charge big-object. To be consistent, I suggest renaming kmem_cache_charge
to memcg_kmem_charge to handle both slab object and big-object. And I saw
all the functions related to object charging is moved to memcontrol.c (e.g.
__memcg_slab_post_alloc_hook), so maybe we should also do this for
memcg_kmem_charge?
Muhcun,
Thanks.
Powered by blists - more mailing lists