lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <lqcwz26ajddhoaq3gfuyt3mzwfn7waeqksoz6k55jrtn32u42z@pdf3dor7ntb6>
Date: Thu, 13 Nov 2025 15:47:01 -0800
From: Shakeel Butt <shakeel.butt@...ux.dev>
To: Jiayuan Chen <jiayuan.chen@...ux.dev>
Cc: linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>, 
	Johannes Weiner <hannes@...xchg.org>, David Hildenbrand <david@...hat.com>, 
	Michal Hocko <mhocko@...nel.org>, Qi Zheng <zhengqi.arch@...edance.com>, 
	Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, Axel Rasmussen <axelrasmussen@...gle.com>, 
	Yuanchu Xie <yuanchu@...gle.com>, Wei Xu <weixugc@...gle.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] mm/vmscan: skip increasing kswapd_failures when
 reclaim was boosted

On Fri, Oct 24, 2025 at 10:27:11AM +0800, Jiayuan Chen wrote:
> We encountered a scenario where direct memory reclaim was triggered,
> leading to increased system latency:
> 
> 1. The memory.low values set on host pods are actually quite large, some
>    pods are set to 10GB, others to 20GB, etc.
> 2. Since most pods have memory protection configured, each time kswapd is
>    woken up, if a pod's memory usage hasn't exceeded its own memory.low,
>    its memory won't be reclaimed.
> 3. When applications start up, rapidly consume memory, or experience
>    network traffic bursts, the kernel reaches steal_suitable_fallback(),
>    which sets watermark_boost and subsequently wakes kswapd.
> 4. In the core logic of kswapd thread (balance_pgdat()), when reclaim is
>    triggered by watermark_boost, the maximum priority is 10. Higher
>    priority values mean less aggressive LRU scanning, which can result in
>    no pages being reclaimed during a single scan cycle:
> 
> if (nr_boost_reclaim && sc.priority == DEF_PRIORITY - 2)
>     raise_priority = false;
> 
> 5. This eventually causes pgdat->kswapd_failures to continuously
>    accumulate, exceeding MAX_RECLAIM_RETRIES, and consequently kswapd stops
>    working. At this point, the system's available memory is still
>    significantly above the high watermark — it's inappropriate for kswapd
>    to stop under these conditions.
> 
> The final observable issue is that a brief period of rapid memory
> allocation causes kswapd to stop running, ultimately triggering direct
> reclaim and making the applications unresponsive.
> 
> Signed-off-by: Jiayuan Chen <jiayuan.chen@...ux.dev>

Please resolve Andrew's comment and add couple of lines on boosted
watermark increasing the chances of kswapd failures and the patch only
targets that particular scenario, the general solution TBD in the commit
message.

With that, you can add:

Reviewed-by: Shakeel Butt <shakeel.butt@...ux.dev>


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ