lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4owaeb7bmkfgfzqd4ztdsi4tefc36cnmpju4yrknsgjm4y32ez@qsgn6lnv3cxb>
Date: Mon, 22 Dec 2025 13:15:05 -0800
From: Shakeel Butt <shakeel.butt@...ux.dev>
To: Jiayuan Chen <jiayuan.chen@...ux.dev>
Cc: linux-mm@...ck.org, Jiayuan Chen <jiayuan.chen@...pee.com>, 
	Andrew Morton <akpm@...ux-foundation.org>, Johannes Weiner <hannes@...xchg.org>, 
	David Hildenbrand <david@...nel.org>, Michal Hocko <mhocko@...nel.org>, 
	Qi Zheng <zhengqi.arch@...edance.com>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, 
	Axel Rasmussen <axelrasmussen@...gle.com>, Yuanchu Xie <yuanchu@...gle.com>, Wei Xu <weixugc@...gle.com>, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1] mm/vmscan: mitigate spurious kswapd_failures reset
 from direct reclaim

On Mon, Dec 22, 2025 at 08:20:21PM +0800, Jiayuan Chen wrote:
> From: Jiayuan Chen <jiayuan.chen@...pee.com>
> 
> When kswapd fails to reclaim memory, kswapd_failures is incremented.
> Once it reaches MAX_RECLAIM_RETRIES, kswapd stops running to avoid
> futile reclaim attempts. However, any successful direct reclaim
> unconditionally resets kswapd_failures to 0, which can cause problems.
> 
> We observed an issue in production on a multi-NUMA system where a
> process allocated large amounts of anonymous pages on a single NUMA
> node, causing its watermark to drop below high and evicting most file
> pages:
> 
> $ numastat -m
> Per-node system memory usage (in MBs):
>                           Node 0          Node 1           Total
>                  --------------- --------------- ---------------
> MemTotal               128222.19       127983.91       256206.11
> MemFree                  1414.48         1432.80         2847.29
> MemUsed                126807.71       126551.11       252358.82
> SwapCached                  0.00            0.00            0.00
> Active                  29017.91        25554.57        54572.48
> Inactive                92749.06        95377.00       188126.06
> Active(anon)            28998.96        23356.47        52355.43
> Inactive(anon)          92685.27        87466.11       180151.39
> Active(file)               18.95         2198.10         2217.05
> Inactive(file)             63.79         7910.89         7974.68
> 
> With swap disabled, only file pages can be reclaimed. When kswapd is
> woken (e.g., via wake_all_kswapds()), it runs continuously but cannot
> raise free memory above the high watermark since reclaimable file pages
> are insufficient. Normally, kswapd would eventually stop after
> kswapd_failures reaches MAX_RECLAIM_RETRIES.
> 
> However, pods on this machine have memory.high set in their cgroup.
> Business processes continuously trigger the high limit, causing frequent
> direct reclaim that keeps resetting kswapd_failures to 0. This prevents
> kswapd from ever stopping.
> 
> The result is that kswapd runs endlessly, repeatedly evicting the few
> remaining file pages which are actually hot. These pages constantly
> refault, generating sustained heavy IO READ pressure.

I don't think kswapd is an issue here. The system is out of memory and
most of the memory is unreclaimable. Either change the workload to use
less memory or enable swap (or zswap) to have more reclaimable memory.

Other than that we can discuss memcg reclaim resetting the kswapd
failure count should be changed or not but that is a separate
discussion.


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ