lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <53de0b3ee0b822418e909db29bfa6513faff9d36@linux.dev>
Date: Fri, 14 Nov 2025 04:17:40 +0000
From: "Jiayuan Chen" <jiayuan.chen@...ux.dev>
To: "Shakeel Butt" <shakeel.butt@...ux.dev>, "Andrew Morton"
 <akpm@...ux-foundation.org>
Cc: linux-mm@...ck.org, "Andrew Morton" <akpm@...ux-foundation.org>,
 "Johannes Weiner" <hannes@...xchg.org>, "David Hildenbrand"
 <david@...hat.com>, "Michal Hocko" <mhocko@...nel.org>, "Qi Zheng"
 <zhengqi.arch@...edance.com>, "Lorenzo Stoakes"
 <lorenzo.stoakes@...cle.com>, "Axel Rasmussen"
 <axelrasmussen@...gle.com>, "Yuanchu Xie" <yuanchu@...gle.com>, "Wei Xu"
 <weixugc@...gle.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] mm/vmscan: skip increasing kswapd_failures when 
 reclaim was boosted

November 14, 2025 at 07:47, "Shakeel Butt" <shakeel.butt@...ux.dev mailto:shakeel.butt@...ux.dev?to=%22Shakeel%20Butt%22%20%3Cshakeel.butt%40linux.dev%3E > wrote:


[...]
> >  The final observable issue is that a brief period of rapid memory
> >  allocation causes kswapd to stop running, ultimately triggering direct
> >  reclaim and making the applications unresponsive.
> >  
> >  Signed-off-by: Jiayuan Chen <jiayuan.chen@...ux.dev>
> > 
> Please resolve Andrew's comment and add couple of lines on boosted
> watermark increasing the chances of kswapd failures and the patch only
> targets that particular scenario, the general solution TBD in the commit
> message.
> 
> With that, you can add:
> 
> Reviewed-by: Shakeel Butt <shakeel.butt@...ux.dev>
>

I see this patch is already in mm-next. I'm not sure how to proceed.
Perhaps Andrew needs to do a git rebase and then reword the commit message?
But regardless, I'll reword the commit message here and please let me know
how to proceed if possible:

'''
mm/vmscan: skip increasing kswapd_failures when reclaim was boosted

We have a colocation cluster used for deploying both offline and online
services simultaneously. In this environment, we encountered a scenario
where direct memory reclamation was triggered due to kswapd not running.

1. When applications start up, rapidly consume memory, or experience
   network traffic bursts, the kernel reaches steal_suitable_fallback(),
   which sets watermark_boost and subsequently wakes kswapd.

2. In the core logic of kswapd thread (balance_pgdat()), when reclaim is
   triggered by watermark_boost, the maximum priority is 10. Higher
   priority values mean less aggressive LRU scanning, which can result in
   no pages being reclaimed during a single scan cycle:

   if (nr_boost_reclaim && sc.priority == DEF_PRIORITY - 2)
       raise_priority = false;

3. Additionally, many of our pods are configured with memory.low, which
   prevents memory reclamation in certain cgroups, further increasing the
   chance of failing to reclaim memory.

4. This eventually causes pgdat->kswapd_failures to continuously
   accumulate, exceeding MAX_RECLAIM_RETRIES, and consequently kswapd stops
   working. At this point, the system's available memory is still
   significantly above the high watermark — it's inappropriate for kswapd
   to stop under these conditions.

The final observable issue is that a brief period of rapid memory
allocation causes kswapd to stop running, ultimately triggering direct
reclaim and making the applications unresponsive.

This problem leading to direct memory reclamation has been a long-standing
issue in our production environment. We initially held the simple
assumption that it was caused by applications allocating memory too rapidly
for kswapd to keep up with reclamation. However, after we began monitoring
kswapd's runtime behavior, we discovered a different pattern:
'''
kswapd initially exhibits very aggressive activity even when there is still
considerable free memory, but it subsequently stops running entirely, even
as memory levels approach the low watermark.
'''

In summary, both boosted watermarks and memory.low increase the probability
of kswapd operation failures.

This patch specifically addresses the scenario involving boosted watermarks
by not incrementing kswapd_failures when reclamation fails. A more general
solution, potentially addressing memory.low or other cases, requires further
discussion.

Link: https://lkml.kernel.org/r/20251024022711.382238-1-jiayuan.chen@linux.dev
Reviewed-by: Shakeel Butt <shakeel.butt@...ux.dev>
Signed-off-by: Jiayuan Chen <jiayuan.chen@...ux.dev>
Cc: Axel Rasmussen <axelrasmussen@...gle.com>
Cc: David Hildenbrand <david@...hat.com>
Cc: Johannes Weiner <hannes@...xchg.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: Michal Hocko <mhocko@...nel.org>
Cc: Qi Zheng <zhengqi.arch@...edance.com>
Cc: Shakeel Butt <shakeel.butt@...ux.dev>
Cc: Wei Xu <weixugc@...gle.com>
Cc: Yuanchu Xie <yuanchu@...gle.com>
Signed-off-by: Andrew Morton <akpm@...ux-foundation.org>

'''

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ