linux-kernel - Re: [PATCH v1] mm/vmscan: Add retry logic for cgroups with memory.low in kswapd

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <db4d9e73e6a70033da561ed88aef32c1ebe411dd@linux.dev>
Date: Mon, 20 Oct 2025 10:11:23 +0000
From: "Jiayuan Chen" <jiayuan.chen@...ux.dev>
To: "Michal Hocko" <mhocko@...e.com>
Cc: linux-mm@...ck.org, "Andrew Morton" <akpm@...ux-foundation.org>, "Axel
 Rasmussen" <axelrasmussen@...gle.com>, "Yuanchu Xie"
 <yuanchu@...gle.com>, "Wei Xu" <weixugc@...gle.com>, "Johannes Weiner"
 <hannes@...xchg.org>, "David Hildenbrand" <david@...hat.com>, "Qi Zheng"
 <zhengqi.arch@...edance.com>, "Shakeel Butt" <shakeel.butt@...ux.dev>,
 "Lorenzo Stoakes" <lorenzo.stoakes@...cle.com>,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1] mm/vmscan: Add retry logic for cgroups with
 memory.low in kswapd

October 17, 2025 at 02:43, "Michal Hocko" <mhocko@...e.com mailto:mhocko@...e.com?to=%22Michal%20Hocko%22%20%3Cmhocko%40suse.com%3E > wrote:

> 
> On Thu 16-10-25 15:10:31, Jiayuan Chen wrote:
> [...]
> 
> > 
> > The issue we encountered is that since the watermark_boost parameter is enabled by
> >  default, it causes kswapd to be woken up even when memory watermarks are still relatively
> >  high. Due to rapid consecutive wake-ups, kswapd_failures eventually reaches MAX_RECLAIM_RETRIES,
> >  causing kswapd to stop running, which ultimately triggers direct memory reclaim.
> > 
> >  I believe we should choose another approach that avoids breaking the memory.low semantics.
> >  Specifically, in cases where kswapd is woken up due to watermark_boost, we should bypass the
> >  logic that increments kswapd_failures.
> > 
> yes, this seems like unintended side effect of the implementation. Seems
> like a rare problem as low limits would have to be configured very close
> to kswapd watermarks. My assumption has always been that low limits are
> not getting very close to watermarks because that makes any reclaim very
> hard and configuration rather unstable but you might have a very good
> reason to configure the memory protection that way. It would definitely
> help to describe your specific setup with rationale so that we can look
> into that closer.
> -- 
> Michal Hocko
> SUSE Labs
>

Thank you for your response, Michal.

To provide more context about our specific setup:

1. The memory.low values set on host pods are actually quite large,
   some pods are set to 10GB, others to 20GB, etc.
2. Since most pods have memory limits configured, each time kswapd
   is woken up, if a pod's memory usage hasn't exceeded its own
   memory.low, its memory won't be reclaimed.
3. When applications start up, rapidly consume memory, or experience
   network traffic bursts, the kernel reaches steal_suitable_fallback(),
   which sets watermark_boost and subsequently wakes kswapd.
4. In the core logic of kswapd thread (balance_pgdat()), when reclaim is
   triggered by watermark_boost, the maximum priority is 10. Higher priority
   values mean less aggressive LRU scanning, which can result in no pages
   being reclaimed during a single scan cycle:

if (nr_boost_reclaim && sc.priority == DEF_PRIORITY - 2)
    raise_priority = false;

5. This eventually causes pgdat->kswapd_failures to continuously accumulate,
   exceeding MAX_RECLAIM_RETRIES, and consequently kswapd stops working.
   At this point, the system's available memory is still significantly above
   the high watermark—it's inappropriate for kswapd to stop under these
   conditions.

The final observable issue is that a brief period of rapid memory allocation
causes kswapd to stop running, ultimately triggering direct reclaim and
making the applications unresponsive.