lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <langyedbbu7b4zkz5o7yy7m7bdlusoa3zwsjbgrqt2p7ou37qm@fi7rovfl5gfz>
Date: Mon, 12 Jan 2026 13:29:06 -0800
From: Shakeel Butt <shakeel.butt@...ux.dev>
To: Jiayuan Chen <jiayuan.chen@...ux.dev>
Cc: Michal Hocko <mhocko@...e.com>, linux-mm@...ck.org, 
	Jiayuan Chen <jiayuan.chen@...pee.com>, Andrew Morton <akpm@...ux-foundation.org>, 
	Johannes Weiner <hannes@...xchg.org>, David Hildenbrand <david@...nel.org>, 
	Qi Zheng <zhengqi.arch@...edance.com>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, 
	Axel Rasmussen <axelrasmussen@...gle.com>, Yuanchu Xie <yuanchu@...gle.com>, Wei Xu <weixugc@...gle.com>, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] mm/vmscan: mitigate spurious kswapd_failures reset
 from direct reclaim

Hi Jiayuan,

Sorry for late reply. Let me respond in-place below.

On Wed, Jan 07, 2026 at 11:39:36AM +0000, Jiayuan Chen wrote:
[...]
> 
> Hi Shakeel,
> 
> Thanks for the feedback.
> 
> To be honest, the issue is difficult to reproduce because the boundary conditions are quite complex.
> We also haven't deployed this patch in production yet. I discovered the relationship between
> kswapd_failures and direct reclaim through the following bpftrace script:
> 
> '''bash
> 
> bpftrace -e '
> #include <linux/mmzone.h>
> #include <linux/shrinker.h>
> kprobe:balance_pgdat {
> 	$pgdat = (struct pglist_data *)arg0;
> 	if ($pgdat->kswapd_failures > 0) {
> 		printf("[node %d] [%lu] kswapd end, kswapd_failures %d\n", $pgdat->node_id, jiffies, $pgdat->kswapd_failures);
> 	}
> }
> tracepoint:vmscan:mm_vmscan_direct_reclaim_end {
> 	printf("[cpu %d] [%ul] reset kswapd_failures %d \n", cpu, jiffies, args.nr_reclaimed)
> }
> '
> 
> '''
> 
> The trace results showed that when kswapd_failures reaches 15, continuous direct reclaim keeps
> resetting it to 0. This was accompanied by a flood of kswapd_failures log entries, and shortly
> after, we observed massive refaults occurring.
> (Note that I can only observe up to 15 in the trace due to a kprobe limitation:
> the kprobe on balance_pgdat fires at function entry, but kswapd_failures is incremented to 16 only
> when balance_pgdat fails to reclaim any pages - at which point kswapd goes to sleep and there's no
> suitable hook point to capture it.)
> 
> 
> Before I send v3, I'd like to continue the discussion to make sure we're aligned on the approach:
> 
>     Do you think the bpftrace evidence above is sufficient?

Mainly I want to see if the patch is contributing positively or
negatively in the situation you are seeing in your production. Overall I
think Michal and I are on the same page that the patch is net positive
but the testing in production would eliminate the concerns completely.
Anyways we can proceed with the patch and we can always change in future
if this does not work. Please go ahead with v3 with additional
explanation.

> 
> 
> If you and Michal are okay with the current approach, I'll prepare v3 with mote detailed comments addressed.
> 
> By the way, this tracing limitation makes me wonder: would it be appropriate to add two tracepoints for
> kswapd_failures? One for when kswapd_failures reaches MAX_RECLAIM_RETRIES (16), and another for when it
> gets reset to 0. Currently, the only way to detect this is by polling node_unreclaimable from /proc/zoneinfo,
> but the sampling interval is usually too coarse to catch these events.

tracepoints are cheap and I am all for more observability. Go ahead and
propose the tracepoints which you see fit.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ