linux-kernel - [PATCH v5 3/3] selftests: cgroup: Replace sleep with cg_read_key_long_poll() for waiting on nr_dying

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20251203115631.947908-4-zhangguopeng@kylinos.cn>
Date: Wed,  3 Dec 2025 19:56:31 +0800
From: Guopeng Zhang <zhangguopeng@...inos.cn>
To: tj@...nel.org,
	hannes@...xchg.org,
	mkoutny@...e.com,
	shuah@...nel.org,
	roman.gushchin@...ux.dev,
	lance.yang@...ux.dev,
	shakeel.butt@...ux.dev
Cc: muchun.song@...ux.dev,
	linux-kselftest@...r.kernel.org,
	linux-mm@...ck.org,
	linux-kernel@...r.kernel.org,
	Guopeng Zhang <zhangguopeng@...inos.cn>
Subject: [PATCH v5 3/3] selftests: cgroup: Replace sleep with cg_read_key_long_poll() for waiting on nr_dying_descendants

Replace the manual sleep-and-retry logic in test_kmem_dead_cgroups()
with the new helper `cg_read_key_long_poll()`. This change improves
the robustness of the test by polling the "nr_dying_descendants"
counter in `cgroup.stat` until it reaches 0 or the timeout is exceeded.

Additionally, increase the retry timeout to 8 seconds (from 5 seconds)
based on testing results:
  - With 5-second timeout: 4/20 runs passed.
  - With 8-second timeout: 20/20 runs passed.

The 8 second timeout is based on stress testing of test_kmem_dead_cgroups()
under load: 5 seconds was occasionally not enough for reclaim of dying
descendants to complete, whereas 8 seconds consistently covered the observed
latencies. This value is intended as a generous upper bound for the
asynchronous reclaim and is not tied to any specific kernel constant, so it
can be adjusted in the future if reclaim behavior changes.

Signed-off-by: Guopeng Zhang <zhangguopeng@...inos.cn>
Reviewed-by: Shakeel Butt <shakeel.butt@...ux.dev>
---
 tools/testing/selftests/cgroup/test_kmem.c | 33 ++++++++++------------
 1 file changed, 15 insertions(+), 18 deletions(-)

diff --git a/tools/testing/selftests/cgroup/test_kmem.c b/tools/testing/selftests/cgroup/test_kmem.c
index ca38525484e3..eeabd34bf083 100644
--- a/tools/testing/selftests/cgroup/test_kmem.c
+++ b/tools/testing/selftests/cgroup/test_kmem.c
@@ -26,6 +26,7 @@
  */
 #define MAX_VMSTAT_ERROR (4096 * 64 * get_nprocs())
 
+#define KMEM_DEAD_WAIT_RETRIES        80
 
 static int alloc_dcache(const char *cgroup, void *arg)
 {
@@ -306,9 +307,7 @@ static int test_kmem_dead_cgroups(const char *root)
 {
 	int ret = KSFT_FAIL;
 	char *parent;
-	long dead;
-	int i;
-	int max_time = 20;
+	long dead = -1;
 
 	parent = cg_name(root, "kmem_dead_cgroups_test");
 	if (!parent)
@@ -323,21 +322,19 @@ static int test_kmem_dead_cgroups(const char *root)
 	if (cg_run_in_subcgroups(parent, alloc_dcache, (void *)100, 30))
 		goto cleanup;
 
-	for (i = 0; i < max_time; i++) {
-		dead = cg_read_key_long(parent, "cgroup.stat",
-					"nr_dying_descendants ");
-		if (dead == 0) {
-			ret = KSFT_PASS;
-			break;
-		}
-		/*
-		 * Reclaiming cgroups might take some time,
-		 * let's wait a bit and repeat.
-		 */
-		sleep(1);
-		if (i > 5)
-			printf("Waiting time longer than 5s; wait: %ds (dead: %ld)\n", i, dead);
-	}
+	/*
+	 * Allow up to ~8s for reclaim of dying descendants to complete.
+	 * This is a generous upper bound derived from stress testing, not
+	 * from a specific kernel constant, and can be adjusted if reclaim
+	 * behavior changes in the future.
+	 */
+	dead = cg_read_key_long_poll(parent, "cgroup.stat",
+					"nr_dying_descendants ", 0, KMEM_DEAD_WAIT_RETRIES,
+					DEFAULT_WAIT_INTERVAL_US);
+	if (dead)
+		goto cleanup;
+
+	ret = KSFT_PASS;
 
 cleanup:
 	cg_destroy(parent);
-- 
2.25.1