[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1b6cb2c2-9aed-456e-a803-afad9731cb42@huaweicloud.com>
Date: Fri, 27 Jun 2025 16:50:23 +0800
From: Chen Ridong <chenridong@...weicloud.com>
To: Johannes Weiner <hannes@...xchg.org>, Kairui Song <ryncsn@...il.com>
Cc: Muchun Song <muchun.song@...ux.dev>,
Muchun Song <songmuchun@...edance.com>, mhocko@...nel.org,
roman.gushchin@...ux.dev, shakeel.butt@...ux.dev, akpm@...ux-foundation.org,
david@...morbit.com, zhengqi.arch@...edance.com, yosry.ahmed@...ux.dev,
nphamcs@...il.com, chengming.zhou@...ux.dev, linux-kernel@...r.kernel.org,
cgroups@...r.kernel.org, linux-mm@...ck.org,
hamzamahfooz@...ux.microsoft.com, apais@...ux.microsoft.com,
yuzhao@...gle.com
Subject: Re: [PATCH RFC 00/28] Eliminate Dying Memory Cgroup
On 2025/4/18 3:04, Johannes Weiner wrote:
> On Fri, Apr 18, 2025 at 02:22:12AM +0800, Kairui Song wrote:
>> On Tue, Apr 15, 2025 at 4:02 PM Muchun Song <muchun.song@...ux.dev> wrote:
>> We currently have some workloads running with `nokmem` due to objcg
>> performance issues. I know there are efforts to improve them, but so
>> far it's still not painless to have. So I'm a bit worried about
>> this...
>
> That's presumably more about the size and corresponding rate of slab
> allocations. The objcg path has the same percpu cached charging and
> uncharging, direct task pointer etc. as the direct memcg path. Not
> sure the additional objcg->memcg indirection in the slowpath would be
> noticable among hierarchical page counter atomics...
>
We have encountered the same memory accounting performance issue with
kmem in our environment running cgroup v1 on Linux kernel v6.6. We have
observed significant performance overhead in the following critical path:
alloc_pages
__alloc_pages
__memcg_kmem_charge_page
memcg_account_kmem
page_counter_charge
Our profiling shows this call chain accounts for over 23% . This
bottleneck occurs because multiple Docker containers simultaneously
charge to their common parent's page_counter, creating contention on the
atomic operations.
While cgroup v1 is being deprecated, many production systems still rely
on it. To mitigate this issue, I'm considering implementing a per-CPU
stock mechanism specifically for memcg_account_kmem (limited to v1
usage). Would this approach be acceptable?
Best regard,
Ridong
>> This is a problem indeed, but isn't reparenting a rather rare
>> operation? So a slow async worker might be just fine?
>
> That could be millions of pages that need updating. rmdir is no fast
> path, but that's a lot of work compared to flipping objcg->memcg and
> doing a list_splice().
>
> We used to do this in the past, if you check the git history. That's
> not a desirable direction to take again, certainly not without hard
> data showing that objcg is an absolute no go.
Powered by blists - more mailing lists