linux-kernel - Re: [PATCH RFC 00/28] Eliminate Dying Memory Cgroup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1b6cb2c2-9aed-456e-a803-afad9731cb42@huaweicloud.com>
Date: Fri, 27 Jun 2025 16:50:23 +0800
From: Chen Ridong <chenridong@...weicloud.com>
To: Johannes Weiner <hannes@...xchg.org>, Kairui Song <ryncsn@...il.com>
Cc: Muchun Song <muchun.song@...ux.dev>,
 Muchun Song <songmuchun@...edance.com>, mhocko@...nel.org,
 roman.gushchin@...ux.dev, shakeel.butt@...ux.dev, akpm@...ux-foundation.org,
 david@...morbit.com, zhengqi.arch@...edance.com, yosry.ahmed@...ux.dev,
 nphamcs@...il.com, chengming.zhou@...ux.dev, linux-kernel@...r.kernel.org,
 cgroups@...r.kernel.org, linux-mm@...ck.org,
 hamzamahfooz@...ux.microsoft.com, apais@...ux.microsoft.com,
 yuzhao@...gle.com
Subject: Re: [PATCH RFC 00/28] Eliminate Dying Memory Cgroup



On 2025/4/18 3:04, Johannes Weiner wrote:
> On Fri, Apr 18, 2025 at 02:22:12AM +0800, Kairui Song wrote:
>> On Tue, Apr 15, 2025 at 4:02 PM Muchun Song <muchun.song@...ux.dev> wrote:
>> We currently have some workloads running with `nokmem` due to objcg
>> performance issues. I know there are efforts to improve them, but so
>> far it's still not painless to have. So I'm a bit worried about
>> this...
> 
> That's presumably more about the size and corresponding rate of slab
> allocations. The objcg path has the same percpu cached charging and
> uncharging, direct task pointer etc. as the direct memcg path. Not
> sure the additional objcg->memcg indirection in the slowpath would be
> noticable among hierarchical page counter atomics...
> 

We have encountered the same memory accounting performance issue with
kmem in our environment running cgroup v1 on Linux kernel v6.6. We have
observed significant performance overhead in the following critical path:

alloc_pages
  __alloc_pages
    __memcg_kmem_charge_page
      memcg_account_kmem
        page_counter_charge

Our profiling shows this call chain accounts for over 23% . This
bottleneck occurs because multiple Docker containers simultaneously
charge to their common parent's page_counter, creating contention on the
atomic operations.

While cgroup v1 is being deprecated, many production systems still rely
on it. To mitigate this issue, I'm considering implementing a per-CPU
stock mechanism specifically for memcg_account_kmem (limited to v1
usage). Would this approach be acceptable?

Best regard,
Ridong


>> This is a problem indeed, but isn't reparenting a rather rare
>> operation? So a slow async worker might be just fine?
> 
> That could be millions of pages that need updating. rmdir is no fast
> path, but that's a lot of work compared to flipping objcg->memcg and
> doing a list_splice().
> 
> We used to do this in the past, if you check the git history. That's
> not a desirable direction to take again, certainly not without hard
> data showing that objcg is an absolute no go.