[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3edf7d6a-5e32-45f1-a6fc-ca5ca786551b@huaweicloud.com>
Date: Mon, 15 Dec 2025 15:38:16 +0800
From: Chen Ridong <chenridong@...weicloud.com>
To: Kairui Song <ryncsn@...il.com>, linux-mm@...ck.org,
Johannes Weiner <hannes@...xchg.org>, Hugh Dickins <hughd@...gle.com>,
Michal Hocko <mhocko@...nel.org>, Roman Gushchin <roman.gushchin@...ux.dev>,
Shakeel Butt <shakeel.butt@...ux.dev>, Muchun Song <muchun.song@...ux.dev>,
Andrew Morton <akpm@...ux-foundation.org>, cgroups@...r.kernel.org,
linux-kernel@...r.kernel.org
Cc: Kairui Song <kasong@...cent.com>
Subject: Re: [PATCH RFC] mm/memcontrol: make lru_zone_size atomic and simplify
sanity check
On 2025/12/15 14:45, Kairui Song wrote:
> From: Kairui Song <kasong@...cent.com>
>
> commit ca707239e8a7 ("mm: update_lru_size warn and reset bad lru_size")
> introduced a sanity check to catch memcg counter underflow, which is
> more like a workaround for another bug: lru_zone_size is unsigned, so
> underflow will wrap it around and return an enormously large number,
> then the memcg shrinker will loop almost forever as the calculated
> number of folios to shrink is huge. That commit also checks if a zero
> value matches the empty LRU list, so we have to hold the LRU lock, and
> do the counter adding differently depending on whether the nr_pages is
> negative.
>
> But later commit b4536f0c829c ("mm, memcg: fix the active list aging for
> lowmem requests when memcg is enabled") already removed the LRU
> emptiness check, doing the adding differently is meaningless now. And if
Agree.
I did submit a patch to address that, but it was not accepted.
For reference, here is the link to the discussion:
https://lore.kernel.org/lkml/CAOUHufbCCkOBGcSPZqNY+FXcrH8+U7_nRvftzOzKUBS4hn+kuQ@mail.gmail.com/
> we just turn it into an atomic long, underflow isn't a big issue either,
> and can be checked at the reader side. The reader size is much less
> frequently called than the updater.
>
> So let's turn the counter into an atomic long and check at the
> reader side instead, which has a smaller overhead. Use atomic to avoid
> potential locking issue. The underflow correction is removed, which
> should be fine as if there is a mass leaking of the LRU size counter,
> something else may also have gone very wrong, and one should fix that
> leaking site instead.
>
> For now still keep the LRU lock context, in thoery that can be removed
> too since the update is atomic, if we can tolerate a temporary
> inaccurate reading, but currently there is no benefit doing so yet.
>
> Signed-off-by: Kairui Song <kasong@...cent.com>
> ---
> include/linux/memcontrol.h | 9 +++++++--
> mm/memcontrol.c | 18 +-----------------
> 2 files changed, 8 insertions(+), 19 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 0651865a4564..197f48faa8ba 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -112,7 +112,7 @@ struct mem_cgroup_per_node {
> /* Fields which get updated often at the end. */
> struct lruvec lruvec;
> CACHELINE_PADDING(_pad2_);
> - unsigned long lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS];
> + atomic_long_t lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS];
> struct mem_cgroup_reclaim_iter iter;
>
> #ifdef CONFIG_MEMCG_NMI_SAFETY_REQUIRES_ATOMIC
> @@ -903,10 +903,15 @@ static inline
> unsigned long mem_cgroup_get_zone_lru_size(struct lruvec *lruvec,
> enum lru_list lru, int zone_idx)
> {
> + long val;
> struct mem_cgroup_per_node *mz;
>
> mz = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
> - return READ_ONCE(mz->lru_zone_size[zone_idx][lru]);
> + val = atomic_long_read(&mz->lru_zone_size[zone_idx][lru]);
> + if (WARN_ON_ONCE(val < 0))
> + return 0;
> +
> + return val;
> }
>
> void __mem_cgroup_handle_over_high(gfp_t gfp_mask);
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 9b07db2cb232..d5da09fbe43e 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1273,28 +1273,12 @@ void mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru,
> int zid, int nr_pages)
> {
> struct mem_cgroup_per_node *mz;
> - unsigned long *lru_size;
> - long size;
>
> if (mem_cgroup_disabled())
> return;
>
> mz = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
> - lru_size = &mz->lru_zone_size[zid][lru];
> -
> - if (nr_pages < 0)
> - *lru_size += nr_pages;
> -
> - size = *lru_size;
> - if (WARN_ONCE(size < 0,
> - "%s(%p, %d, %d): lru_size %ld\n",
> - __func__, lruvec, lru, nr_pages, size)) {
> - VM_BUG_ON(1);
> - *lru_size = 0;
> - }
> -
> - if (nr_pages > 0)
> - *lru_size += nr_pages;
> + atomic_long_add(nr_pages, &mz->lru_zone_size[zid][lru]);
> }
>
> /**
>
> ---
> base-commit: 1ef4e3be45a85a103a667cc39fd68c3826e6acb9
> change-id: 20251211-mz-lru-size-cleanup-c81deccfd5d7
>
> Best regards,
--
Best regards,
Ridong
Powered by blists - more mailing lists