[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d016b76d-581a-4582-920d-21f64318090a@linux.dev>
Date: Tue, 6 Jan 2026 15:08:57 +0800
From: Qi Zheng <qi.zheng@...ux.dev>
To: Yosry Ahmed <yosry.ahmed@...ux.dev>, Michal Koutný
<mkoutny@...e.com>
Cc: hannes@...xchg.org, hughd@...gle.com, mhocko@...e.com,
roman.gushchin@...ux.dev, shakeel.butt@...ux.dev, muchun.song@...ux.dev,
david@...nel.org, lorenzo.stoakes@...cle.com, ziy@...dia.com,
harry.yoo@...cle.com, imran.f.khan@...cle.com, kamalesh.babulal@...cle.com,
axelrasmussen@...gle.com, yuanchu@...gle.com, weixugc@...gle.com,
chenridong@...weicloud.com, akpm@...ux-foundation.org,
hamzamahfooz@...ux.microsoft.com, apais@...ux.microsoft.com,
lance.yang@...ux.dev, linux-mm@...ck.org, linux-kernel@...r.kernel.org,
cgroups@...r.kernel.org, Muchun Song <songmuchun@...edance.com>,
Qi Zheng <zhengqi.arch@...edance.com>
Subject: Re: [PATCH v2 27/28] mm: memcontrol: eliminate the problem of dying
memory cgroup for LRU folios
On 1/6/26 12:14 AM, Yosry Ahmed wrote:
> On Mon, Jan 05, 2026 at 11:41:46AM +0100, Michal Koutný wrote:
>> Hi Qi.
>>
>> On Wed, Dec 17, 2025 at 03:27:51PM +0800, Qi Zheng <qi.zheng@...ux.dev> wrote:
>>
>>> @@ -5200,22 +5238,27 @@ int __mem_cgroup_try_charge_swap(struct folio *folio, swp_entry_t entry)
>>> unsigned int nr_pages = folio_nr_pages(folio);
>>> struct page_counter *counter;
>>> struct mem_cgroup *memcg;
>>> + struct obj_cgroup *objcg;
>>>
>>> if (do_memsw_account())
>>> return 0;
>>>
>>> - memcg = folio_memcg(folio);
>>> -
>>> - VM_WARN_ON_ONCE_FOLIO(!memcg, folio);
>>> - if (!memcg)
>>> + objcg = folio_objcg(folio);
>>> + VM_WARN_ON_ONCE_FOLIO(!objcg, folio);
>>> + if (!objcg)
>>> return 0;
>>>
>>> + rcu_read_lock();
>>> + memcg = obj_cgroup_memcg(objcg);
>>> if (!entry.val) {
>>> memcg_memory_event(memcg, MEMCG_SWAP_FAIL);
>>> + rcu_read_unlock();
>>> return 0;
>>> }
>>>
>>> memcg = mem_cgroup_id_get_online(memcg);
>>> + /* memcg is pined by memcg ID. */
>>> + rcu_read_unlock();
>>>
>>> if (!mem_cgroup_is_root(memcg) &&
>>> !page_counter_try_charge(&memcg->swap, nr_pages, &counter)) {
>>
>> Later there is:
>> swap_cgroup_record(folio, mem_cgroup_id(memcg), entry);
>>
>> As per the comment memcg remains pinned by the ID which is associated
>> with a swap slot, i.e. theoretically time unbound (shmem).
>> (This was actually brought up by Yosry in stats subthread [1])
>>
>> I think that should be tackled too to eliminate the problem completely.
>
> FWIW, I am not sure if swap entries is the last cause of pinning memcgs,
> I am pretty sure there will be others that we haven't found yet. This is
Agree.
> why I think we shouldn't assume that the time between offlining and
> releasing a memcg is short or bounded when fixing the stats problem.
If I have not misunderstood your suggestion in the other thread, I plan
to do the following in v3:
1. define a memcgv1-only function:
void memcg1_reparent_state_local(struct mem_cgroup *memcg, struct
mem_cgroup *parent)
{
int i;
synchronize_rcu();
for (i = 0; i < ARRAY_SIZE(memcg1_stats); i++) {
int idx = memcg1_stats[i];
unsigned long value = memcg_page_state_local(memcg, idx);
mod_memcg_page_state_local(parent, idx, value);
}
}
2. call it after reparent_unlocks():
memcg_reparent_objcgs
--> objcg = __memcg_reparent_objcgs(memcg, parent);
reparent_unlocks(memcg, parent);
reparent_state_local(memcg, parent);
--> memcg1_reparent_state_local()
>
>>
>> As I look at the code, these memcg IDs (private [2]) could be converted
>> to objcg IDs so that reparenting applies also to folios that are
>> currently swapped out. (Or convert to swap_cgroup_ctrl from the vector
>> of IDs to a vector of objcg pointers, depending on space.)
>
> I think we can do objcg IDs, but be careful to keep the same behavior as
> today and avoid overexhausting the 16 bit ID space. So we need to also
> drop the ref to the objcg ID when the memcg is offlined and the objcg is
> reparented, such that the objcg ID is deleted unless there are swapped
> out entries.
>
> I think this can be done on top of this series, not necessarily as part
> of it.
Agree, I prefer to address this issue in a separate patchset.
Thanks,
Qi
>
>>
>> Thanks,
>> Michal
>>
>> [1] https://lore.kernel.org/r/ebdhvcwygvnfejai5azhg3sjudsjorwmlcvmzadpkhexoeq3tb@5gj5y2exdhpn
>> [2] https://lore.kernel.org/r/20251225232116.294540-1-shakeel.butt@linux.dev
>
>
Powered by blists - more mailing lists