[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <db4d9e73e6a70033da561ed88aef32c1ebe411dd@linux.dev>
Date: Mon, 20 Oct 2025 10:11:23 +0000
From: "Jiayuan Chen" <jiayuan.chen@...ux.dev>
To: "Michal Hocko" <mhocko@...e.com>
Cc: linux-mm@...ck.org, "Andrew Morton" <akpm@...ux-foundation.org>, "Axel
Rasmussen" <axelrasmussen@...gle.com>, "Yuanchu Xie"
<yuanchu@...gle.com>, "Wei Xu" <weixugc@...gle.com>, "Johannes Weiner"
<hannes@...xchg.org>, "David Hildenbrand" <david@...hat.com>, "Qi Zheng"
<zhengqi.arch@...edance.com>, "Shakeel Butt" <shakeel.butt@...ux.dev>,
"Lorenzo Stoakes" <lorenzo.stoakes@...cle.com>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1] mm/vmscan: Add retry logic for cgroups with
memory.low in kswapd
October 17, 2025 at 02:43, "Michal Hocko" <mhocko@...e.com mailto:mhocko@...e.com?to=%22Michal%20Hocko%22%20%3Cmhocko%40suse.com%3E > wrote:
>
> On Thu 16-10-25 15:10:31, Jiayuan Chen wrote:
> [...]
>
> >
> > The issue we encountered is that since the watermark_boost parameter is enabled by
> > default, it causes kswapd to be woken up even when memory watermarks are still relatively
> > high. Due to rapid consecutive wake-ups, kswapd_failures eventually reaches MAX_RECLAIM_RETRIES,
> > causing kswapd to stop running, which ultimately triggers direct memory reclaim.
> >
> > I believe we should choose another approach that avoids breaking the memory.low semantics.
> > Specifically, in cases where kswapd is woken up due to watermark_boost, we should bypass the
> > logic that increments kswapd_failures.
> >
> yes, this seems like unintended side effect of the implementation. Seems
> like a rare problem as low limits would have to be configured very close
> to kswapd watermarks. My assumption has always been that low limits are
> not getting very close to watermarks because that makes any reclaim very
> hard and configuration rather unstable but you might have a very good
> reason to configure the memory protection that way. It would definitely
> help to describe your specific setup with rationale so that we can look
> into that closer.
> --
> Michal Hocko
> SUSE Labs
>
Thank you for your response, Michal.
To provide more context about our specific setup:
1. The memory.low values set on host pods are actually quite large,
some pods are set to 10GB, others to 20GB, etc.
2. Since most pods have memory limits configured, each time kswapd
is woken up, if a pod's memory usage hasn't exceeded its own
memory.low, its memory won't be reclaimed.
3. When applications start up, rapidly consume memory, or experience
network traffic bursts, the kernel reaches steal_suitable_fallback(),
which sets watermark_boost and subsequently wakes kswapd.
4. In the core logic of kswapd thread (balance_pgdat()), when reclaim is
triggered by watermark_boost, the maximum priority is 10. Higher priority
values mean less aggressive LRU scanning, which can result in no pages
being reclaimed during a single scan cycle:
if (nr_boost_reclaim && sc.priority == DEF_PRIORITY - 2)
raise_priority = false;
5. This eventually causes pgdat->kswapd_failures to continuously accumulate,
exceeding MAX_RECLAIM_RETRIES, and consequently kswapd stops working.
At this point, the system's available memory is still significantly above
the high watermarkâit's inappropriate for kswapd to stop under these
conditions.
The final observable issue is that a brief period of rapid memory allocation
causes kswapd to stop running, ultimately triggering direct reclaim and
making the applications unresponsive.
Powered by blists - more mailing lists