linux-kernel - [PATCH] memcg: Always call cond

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <20250523-memcg_fix-v1-1-ad3eafb60477@debian.org>
Date: Fri, 23 May 2025 10:21:06 -0700
From: Breno Leitao <leitao@...ian.org>
To: Johannes Weiner <hannes@...xchg.org>, Michal Hocko <mhocko@...nel.org>, 
 Roman Gushchin <roman.gushchin@...ux.dev>, 
 Shakeel Butt <shakeel.butt@...ux.dev>, Muchun Song <muchun.song@...ux.dev>, 
 Andrew Morton <akpm@...ux-foundation.org>, 
 Chen Ridong <chenridong@...wei.com>, 
 Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc: Michal Hocko <mhocko@...e.com>, cgroups@...r.kernel.org, 
 linux-mm@...ck.org, linux-kernel@...r.kernel.org, kernel-team@...a.com, 
 Michael van der Westhuizen <rmikey@...a.com>, 
 Usama Arif <usamaarif642@...il.com>, 
 Pavel Begunkov <asml.silence@...il.com>, Rik van Riel <riel@...riel.com>, 
 Breno Leitao <leitao@...ian.org>
Subject: [PATCH] memcg: Always call cond_resched() after fn()

I am seeing soft lockup on certain machine types when a cgroup
OOMs. This is happening because killing the process in certain machine
might be very slow, which causes the soft lockup and RCU stalls. This
happens usually when the cgroup has MANY processes and memory.oom.group
is set.

Example I am seeing in real production:

       [462012.244552] Memory cgroup out of memory: Killed process 3370438 (crosvm) ....
       ....
       [462037.318059] Memory cgroup out of memory: Killed process 4171372 (adb) ....
       [462037.348314] watchdog: BUG: soft lockup - CPU#64 stuck for 26s! [stat_manager-ag:1618982]
       ....

Quick look at why this is so slow, it seems to be related to serial
flush for certain machine types. For all the crashes I saw, the target
CPU was at console_flush_all().

In the case above, there are thousands of processes in the cgroup, and
it is soft locking up before it reaches the 1024 limit in the code
(which would call the cond_resched()). So, cond_resched() in 1024 blocks
is not sufficient.

Remove the counter-based conditional rescheduling logic and call
cond_resched() unconditionally after each task iteration, after fn() is
called. This avoids the lockup independently of how slow fn() is.

Cc: Michael van der Westhuizen <rmikey@...a.com>
Cc: Usama Arif <usamaarif642@...il.com>
Cc: Pavel Begunkov <asml.silence@...il.com>
Suggested-by: Rik van Riel <riel@...riel.com>
Signed-off-by: Breno Leitao <leitao@...ian.org>
Fixes: 46576834291869457 ("memcg: fix soft lockup in the OOM process")
---
 mm/memcontrol.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c96c1f2b9cf57..2d4d65f25fecd 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1168,7 +1168,6 @@ void mem_cgroup_scan_tasks(struct mem_cgroup *memcg,
 {
 	struct mem_cgroup *iter;
 	int ret = 0;
-	int i = 0;
 
 	BUG_ON(mem_cgroup_is_root(memcg));
 
@@ -1178,10 +1177,9 @@ void mem_cgroup_scan_tasks(struct mem_cgroup *memcg,
 
 		css_task_iter_start(&iter->css, CSS_TASK_ITER_PROCS, &it);
 		while (!ret && (task = css_task_iter_next(&it))) {
-			/* Avoid potential softlockup warning */
-			if ((++i & 1023) == 0)
-				cond_resched();
 			ret = fn(task, arg);
+			/* Avoid potential softlockup warning */
+			cond_resched();
 		}
 		css_task_iter_end(&it);
 		if (ret) {

---
base-commit: ea15e046263b19e91ffd827645ae5dfa44ebd044
change-id: 20250523-memcg_fix-012257f3109e

Best regards,
-- 
Breno Leitao <leitao@...ian.org>