linux-kernel - Re: [PATCH RFC] mm/memcontrol: make lru_zone

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3edf7d6a-5e32-45f1-a6fc-ca5ca786551b@huaweicloud.com>
Date: Mon, 15 Dec 2025 15:38:16 +0800
From: Chen Ridong <chenridong@...weicloud.com>
To: Kairui Song <ryncsn@...il.com>, linux-mm@...ck.org,
 Johannes Weiner <hannes@...xchg.org>, Hugh Dickins <hughd@...gle.com>,
 Michal Hocko <mhocko@...nel.org>, Roman Gushchin <roman.gushchin@...ux.dev>,
 Shakeel Butt <shakeel.butt@...ux.dev>, Muchun Song <muchun.song@...ux.dev>,
 Andrew Morton <akpm@...ux-foundation.org>, cgroups@...r.kernel.org,
 linux-kernel@...r.kernel.org
Cc: Kairui Song <kasong@...cent.com>
Subject: Re: [PATCH RFC] mm/memcontrol: make lru_zone_size atomic and simplify
 sanity check



On 2025/12/15 14:45, Kairui Song wrote:
> From: Kairui Song <kasong@...cent.com>
> 
> commit ca707239e8a7 ("mm: update_lru_size warn and reset bad lru_size")
> introduced a sanity check to catch memcg counter underflow, which is
> more like a workaround for another bug: lru_zone_size is unsigned, so
> underflow will wrap it around and return an enormously large number,
> then the memcg shrinker will loop almost forever as the calculated
> number of folios to shrink is huge. That commit also checks if a zero
> value matches the empty LRU list, so we have to hold the LRU lock, and
> do the counter adding differently depending on whether the nr_pages is
> negative.
> 
> But later commit b4536f0c829c ("mm, memcg: fix the active list aging for
> lowmem requests when memcg is enabled") already removed the LRU
> emptiness check, doing the adding differently is meaningless now. And if

Agree.

I did submit a patch to address that, but it was not accepted.

For reference, here is the link to the discussion:

https://lore.kernel.org/lkml/CAOUHufbCCkOBGcSPZqNY+FXcrH8+U7_nRvftzOzKUBS4hn+kuQ@mail.gmail.com/

> we just turn it into an atomic long, underflow isn't a big issue either,
> and can be checked at the reader side. The reader size is much less
> frequently called than the updater.
> 
> So let's turn the counter into an atomic long and check at the
> reader side instead, which has a smaller overhead. Use atomic to avoid
> potential locking issue. The underflow correction is removed, which
> should be fine as if there is a mass leaking of the LRU size counter,
> something else may also have gone very wrong, and one should fix that
> leaking site instead.
> 
> For now still keep the LRU lock context, in thoery that can be removed
> too since the update is atomic, if we can tolerate a temporary
> inaccurate reading, but currently there is no benefit doing so yet.
> 
> Signed-off-by: Kairui Song <kasong@...cent.com>
> ---
>  include/linux/memcontrol.h |  9 +++++++--
>  mm/memcontrol.c            | 18 +-----------------
>  2 files changed, 8 insertions(+), 19 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 0651865a4564..197f48faa8ba 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -112,7 +112,7 @@ struct mem_cgroup_per_node {
>  	/* Fields which get updated often at the end. */
>  	struct lruvec		lruvec;
>  	CACHELINE_PADDING(_pad2_);
> -	unsigned long		lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS];
> +	atomic_long_t		lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS];
>  	struct mem_cgroup_reclaim_iter	iter;
>  
>  #ifdef CONFIG_MEMCG_NMI_SAFETY_REQUIRES_ATOMIC
> @@ -903,10 +903,15 @@ static inline
>  unsigned long mem_cgroup_get_zone_lru_size(struct lruvec *lruvec,
>  		enum lru_list lru, int zone_idx)
>  {
> +	long val;
>  	struct mem_cgroup_per_node *mz;
>  
>  	mz = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
> -	return READ_ONCE(mz->lru_zone_size[zone_idx][lru]);
> +	val = atomic_long_read(&mz->lru_zone_size[zone_idx][lru]);
> +	if (WARN_ON_ONCE(val < 0))
> +		return 0;
> +
> +	return val;
>  }
>  
>  void __mem_cgroup_handle_over_high(gfp_t gfp_mask);
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 9b07db2cb232..d5da09fbe43e 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1273,28 +1273,12 @@ void mem_cgroup_update_lru_size(struct lruvec *lruvec, enum lru_list lru,
>  				int zid, int nr_pages)
>  {
>  	struct mem_cgroup_per_node *mz;
> -	unsigned long *lru_size;
> -	long size;
>  
>  	if (mem_cgroup_disabled())
>  		return;
>  
>  	mz = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
> -	lru_size = &mz->lru_zone_size[zid][lru];
> -
> -	if (nr_pages < 0)
> -		*lru_size += nr_pages;
> -
> -	size = *lru_size;
> -	if (WARN_ONCE(size < 0,
> -		"%s(%p, %d, %d): lru_size %ld\n",
> -		__func__, lruvec, lru, nr_pages, size)) {
> -		VM_BUG_ON(1);
> -		*lru_size = 0;
> -	}
> -
> -	if (nr_pages > 0)
> -		*lru_size += nr_pages;
> +	atomic_long_add(nr_pages, &mz->lru_zone_size[zid][lru]);
>  }
>  
>  /**
> 
> ---
> base-commit: 1ef4e3be45a85a103a667cc39fd68c3826e6acb9
> change-id: 20251211-mz-lru-size-cleanup-c81deccfd5d7
> 
> Best regards,

-- 
Best regards,
Ridong