lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1b6cb2c2-9aed-456e-a803-afad9731cb42@huaweicloud.com>
Date: Fri, 27 Jun 2025 16:50:23 +0800
From: Chen Ridong <chenridong@...weicloud.com>
To: Johannes Weiner <hannes@...xchg.org>, Kairui Song <ryncsn@...il.com>
Cc: Muchun Song <muchun.song@...ux.dev>,
 Muchun Song <songmuchun@...edance.com>, mhocko@...nel.org,
 roman.gushchin@...ux.dev, shakeel.butt@...ux.dev, akpm@...ux-foundation.org,
 david@...morbit.com, zhengqi.arch@...edance.com, yosry.ahmed@...ux.dev,
 nphamcs@...il.com, chengming.zhou@...ux.dev, linux-kernel@...r.kernel.org,
 cgroups@...r.kernel.org, linux-mm@...ck.org,
 hamzamahfooz@...ux.microsoft.com, apais@...ux.microsoft.com,
 yuzhao@...gle.com
Subject: Re: [PATCH RFC 00/28] Eliminate Dying Memory Cgroup



On 2025/4/18 3:04, Johannes Weiner wrote:
> On Fri, Apr 18, 2025 at 02:22:12AM +0800, Kairui Song wrote:
>> On Tue, Apr 15, 2025 at 4:02 PM Muchun Song <muchun.song@...ux.dev> wrote:
>> We currently have some workloads running with `nokmem` due to objcg
>> performance issues. I know there are efforts to improve them, but so
>> far it's still not painless to have. So I'm a bit worried about
>> this...
> 
> That's presumably more about the size and corresponding rate of slab
> allocations. The objcg path has the same percpu cached charging and
> uncharging, direct task pointer etc. as the direct memcg path. Not
> sure the additional objcg->memcg indirection in the slowpath would be
> noticable among hierarchical page counter atomics...
> 

We have encountered the same memory accounting performance issue with
kmem in our environment running cgroup v1 on Linux kernel v6.6. We have
observed significant performance overhead in the following critical path:

alloc_pages
  __alloc_pages
    __memcg_kmem_charge_page
      memcg_account_kmem
        page_counter_charge

Our profiling shows this call chain accounts for over 23% . This
bottleneck occurs because multiple Docker containers simultaneously
charge to their common parent's page_counter, creating contention on the
atomic operations.

While cgroup v1 is being deprecated, many production systems still rely
on it. To mitigate this issue, I'm considering implementing a per-CPU
stock mechanism specifically for memcg_account_kmem (limited to v1
usage). Would this approach be acceptable?

Best regard,
Ridong


>> This is a problem indeed, but isn't reparenting a rather rare
>> operation? So a slow async worker might be just fine?
> 
> That could be millions of pages that need updating. rmdir is no fast
> path, but that's a lot of work compared to flipping objcg->memcg and
> doing a list_splice().
> 
> We used to do this in the past, if you check the git history. That's
> not a desirable direction to take again, certainly not without hard
> data showing that objcg is an absolute no go.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ