lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ebdhvcwygvnfejai5azhg3sjudsjorwmlcvmzadpkhexoeq3tb@5gj5y2exdhpn>
Date: Fri, 2 Jan 2026 18:21:00 +0000
From: Yosry Ahmed <yosry.ahmed@...ux.dev>
To: Michal Koutný <mkoutny@...e.com>
Cc: Qi Zheng <qi.zheng@...ux.dev>, Shakeel Butt <shakeel.butt@...ux.dev>, 
	hannes@...xchg.org, hughd@...gle.com, mhocko@...e.com, roman.gushchin@...ux.dev, 
	muchun.song@...ux.dev, david@...nel.org, lorenzo.stoakes@...cle.com, ziy@...dia.com, 
	harry.yoo@...cle.com, imran.f.khan@...cle.com, kamalesh.babulal@...cle.com, 
	axelrasmussen@...gle.com, yuanchu@...gle.com, weixugc@...gle.com, 
	chenridong@...weicloud.com, akpm@...ux-foundation.org, hamzamahfooz@...ux.microsoft.com, 
	apais@...ux.microsoft.com, lance.yang@...ux.dev, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org, cgroups@...r.kernel.org, Qi Zheng <zhengqi.arch@...edance.com>
Subject: Re: [PATCH v2 00/28] Eliminate Dying Memory Cgroup

On Mon, Dec 29, 2025 at 11:52:52AM +0100, Michal Koutný wrote:
> On Tue, Dec 23, 2025 at 04:36:18PM -0800, Shakeel Butt <shakeel.butt@...ux.dev> wrote:
> ...
> > The core stats update functions are mod_memcg_state() and
> > mod_memcg_lruvec_state(). If for v1 only, we add additional check for
> > CSS_DYING and go to parent if CSS_DYING is set then shouldn't we avoid
> > this issue?
> 
> ...and go to first !CSS_DYING ancestor :-/ (as the whole chain of memcgs
> can be offlined)
> 
> IIUC thanks to the reparenting charging (modifying state) to an offlined
> memcg should be an exception...
> 
> 
> On Mon, Dec 29, 2025 at 05:42:43PM +0800, Qi Zheng <qi.zheng@...ux.dev> wrote:
> 
> > > We do reparenting in css_offline() callback and cgroup offlining
> > > happen somewhat like this:
> > > 
> > > 1. Set CSS_DYING
> > > 2. Trigger percpu ref kill
> > > 3. Kernel makes sure css ref killed is seen by all CPUs and then trigger
> > >     css_offline callback.
> > 
> > it seems that we can add the following to
> > mem_cgroup_css_free():
> > 
> > parent->vmstats->state_local += child->vmstats->state_local;
> > 
> > Right? I will continue to take a closer look.
> 
> ...and the time between offlining and free'ing a memcg should not be
> arbitrarily long anymore (right?, the crux of the series).
> So only transferring local stats in mem_cgroup_css_free should yield a
> correct result after limited time range (with possible underflows
> between) with no special precaution for CSS_DYING on charging side.

I don't think this works, unfortunately. Even with refs from folios to
memcgs dropped at offlining, there could still be long-living refs (e.g.
from swapped out entries). So we cannot wait until the memcg is released
or freed to do the reparenting of the stats.

I think the right thing to do is, as discussed with Shakeel, move the
stats at the time of offlining after reparenting the LRUs, and forward
further updates to the first non-dying parent.

We'll need to be careful with synchronization. We'll probably need an
RCU sync after reparenting the LRUs before moving the stats to the
parent, and we'll need to make sure stat updaters get the memcg and
update the stats within the same RCU section. Ideally with guards
against breaking this in the future.

> 
> 0.02€,
> Michal



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ