lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <hrxsjzcyj2uvxi5h5edtkoc3br4ljvz7m36nczoctyyoinrz56@27gcuui7wfpw>
Date: Mon, 5 Jan 2026 16:14:18 +0000
From: Yosry Ahmed <yosry.ahmed@...ux.dev>
To: Michal Koutný <mkoutny@...e.com>
Cc: Qi Zheng <qi.zheng@...ux.dev>, hannes@...xchg.org, hughd@...gle.com, 
	mhocko@...e.com, roman.gushchin@...ux.dev, shakeel.butt@...ux.dev, 
	muchun.song@...ux.dev, david@...nel.org, lorenzo.stoakes@...cle.com, ziy@...dia.com, 
	harry.yoo@...cle.com, imran.f.khan@...cle.com, kamalesh.babulal@...cle.com, 
	axelrasmussen@...gle.com, yuanchu@...gle.com, weixugc@...gle.com, 
	chenridong@...weicloud.com, akpm@...ux-foundation.org, hamzamahfooz@...ux.microsoft.com, 
	apais@...ux.microsoft.com, lance.yang@...ux.dev, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org, cgroups@...r.kernel.org, Muchun Song <songmuchun@...edance.com>, 
	Qi Zheng <zhengqi.arch@...edance.com>
Subject: Re: [PATCH v2 27/28] mm: memcontrol: eliminate the problem of dying
 memory cgroup for LRU folios

On Mon, Jan 05, 2026 at 11:41:46AM +0100, Michal Koutný wrote:
> Hi Qi.
> 
> On Wed, Dec 17, 2025 at 03:27:51PM +0800, Qi Zheng <qi.zheng@...ux.dev> wrote:
> 
> > @@ -5200,22 +5238,27 @@ int __mem_cgroup_try_charge_swap(struct folio *folio, swp_entry_t entry)
> >  	unsigned int nr_pages = folio_nr_pages(folio);
> >  	struct page_counter *counter;
> >  	struct mem_cgroup *memcg;
> > +	struct obj_cgroup *objcg;
> >  
> >  	if (do_memsw_account())
> >  		return 0;
> >  
> > -	memcg = folio_memcg(folio);
> > -
> > -	VM_WARN_ON_ONCE_FOLIO(!memcg, folio);
> > -	if (!memcg)
> > +	objcg = folio_objcg(folio);
> > +	VM_WARN_ON_ONCE_FOLIO(!objcg, folio);
> > +	if (!objcg)
> >  		return 0;
> >  
> > +	rcu_read_lock();
> > +	memcg = obj_cgroup_memcg(objcg);
> >  	if (!entry.val) {
> >  		memcg_memory_event(memcg, MEMCG_SWAP_FAIL);
> > +		rcu_read_unlock();
> >  		return 0;
> >  	}
> >  
> >  	memcg = mem_cgroup_id_get_online(memcg);
> > +	/* memcg is pined by memcg ID. */
> > +	rcu_read_unlock();
> >  
> >  	if (!mem_cgroup_is_root(memcg) &&
> >  	    !page_counter_try_charge(&memcg->swap, nr_pages, &counter)) {
> 
> Later there is:
> 	swap_cgroup_record(folio, mem_cgroup_id(memcg), entry);
> 
> As per the comment memcg remains pinned by the ID which is associated
> with a swap slot, i.e. theoretically time unbound (shmem).
> (This was actually brought up by Yosry in stats subthread [1])
> 
> I think that should be tackled too to eliminate the problem completely.

FWIW, I am not sure if swap entries is the last cause of pinning memcgs,
I am pretty sure there will be others that we haven't found yet. This is
why I think we shouldn't assume that the time between offlining and
releasing a memcg is short or bounded when fixing the stats problem.

> 
> As I look at the code, these memcg IDs (private [2]) could be converted
> to objcg IDs so that reparenting applies also to folios that are
> currently swapped out. (Or convert to swap_cgroup_ctrl from the vector
> of IDs to a vector of objcg pointers, depending on space.)

I think we can do objcg IDs, but be careful to keep the same behavior as
today and avoid overexhausting the 16 bit ID space. So we need to also
drop the ref to the objcg ID when the memcg is offlined and the objcg is
reparented, such that the objcg ID is deleted unless there are swapped
out entries.

I think this can be done on top of this series, not necessarily as part
of it.

> 
> Thanks,
> Michal
> 
> [1] https://lore.kernel.org/r/ebdhvcwygvnfejai5azhg3sjudsjorwmlcvmzadpkhexoeq3tb@5gj5y2exdhpn
> [2] https://lore.kernel.org/r/20251225232116.294540-1-shakeel.butt@linux.dev



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ