linux-kernel - Re: [PATCH v2] mm/vmscan: skip increasing kswapd

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <59f26b5b0c49f8d0f3bdb33f99d69dd3d442ed60@linux.dev>
Date: Wed, 12 Nov 2025 02:21:37 +0000
From: "Jiayuan Chen" <jiayuan.chen@...ux.dev>
To: "Shakeel Butt" <shakeel.butt@...ux.dev>
Cc: linux-mm@...ck.org, "Andrew Morton" <akpm@...ux-foundation.org>,
 "Johannes Weiner" <hannes@...xchg.org>, "David Hildenbrand"
 <david@...hat.com>, "Michal Hocko" <mhocko@...nel.org>, "Qi Zheng"
 <zhengqi.arch@...edance.com>, "Lorenzo Stoakes"
 <lorenzo.stoakes@...cle.com>, "Axel Rasmussen"
 <axelrasmussen@...gle.com>, "Yuanchu Xie" <yuanchu@...gle.com>, "Wei Xu"
 <weixugc@...gle.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] mm/vmscan: skip increasing kswapd_failures when
 reclaim was boosted

2025/11/8 09:11, "Shakeel Butt" <shakeel.butt@...ux.dev mailto:shakeel.butt@...ux.dev?to=%22Shakeel%20Butt%22%20%3Cshakeel.butt%40linux.dev%3E > wrote:

> 
> On Fri, Oct 24, 2025 at 10:27:11AM +0800, Jiayuan Chen wrote:
[...]
> > 
> Can you share the numa configuration of your system? How many nodes are
> there?

My system has 2 nodes.

[...]

> >  if (nr_boost_reclaim && sc.priority == DEF_PRIORITY - 2)
> >  raise_priority = false;
> > 
> Am I understanding this correctly that watermark boost increase the
> chances of this issue but it can still happen?

Yes. In the case of watermark_boost, due to the priority having a lower limit,
the scanning intensity is relatively low, making this issue more likely to occur,
even if I haven't configured memory.low. However, this issue can theoretically happen
even without watermark_boost – for example, if the memory.low values for all pods are
set very high. But I consider that a configuration error (based on the current logic
where kswapd does not attempt to reclaim memory whose usage is below memory.low,

[...]
> >  - if (!sc.nr_reclaimed)
> >  + /*
> >  + * If the reclaim was boosted, we might still be far from the
> >  + * watermark_high at this point. We need to avoid increasing the
> >  + * failure count to prevent the kswapd thread from stopping.
> >  + */
> >  + if (!sc.nr_reclaimed && !boosted)
> >  atomic_inc(&pgdat->kswapd_failures);
> > 
> In general I think not incrementing the failure for boosted kswapd
> iteration is right.

Thanks. I applied a livepatch, and it indeed prevented the occurrence
of direct memory reclaim.

> If this issue (high protection causing kswap
> failures) happen on non-boosted case, I am not sure what should be right
> behavior i.e. allocators doing direct reclaim potentially below low
> protection or allowing kswapd to reclaim below low. For min, it is very
> clear that direct reclaimer has to reclaim as they may have to trigger
> oom-kill. For low protection, I am not sure.
>

We have also encountered this issue in non-boosted scenarios. For instance, when
we disabled swap (meaning only file pages are reclaimed, not anonymous pages), it
indeed occurred even without memory.low configured, especially when anonymous pages
constituted the majority.

Another scenario is misconfigured memory.low. However, in our production environment,
the memory.low configurations are generally reasonable – the sum of all low values is
only about half of the system's total memory.

Regarding how to handle memory.low, I believe there is still room for optimization in
kswapd. From an administrator's perspective, we typically calculate memory.low as a
percentage of memory.max (applications often iterate quickly, and usually no one knows
the exact optimal threshold for low).
Furthermore, to make the low protection as effective as possible, memory.low values tend
to be set on the higher side. This inevitably leads to a significant amount of reclaimable
memory not being reclaimed. In the scenarios I've encountered, memory.low, although intended
as a soft limit, doesn't seem very "soft" in practice. This was also the goal of the v1 patch,
although more refined work might still be needed.