lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241211105336.380cb545@fangorn>
Date: Wed, 11 Dec 2024 10:53:36 -0500
From: Rik van Riel <riel@...riel.com>
To: Johannes Weiner <hannes@...xchg.org>
Cc: Michal Hocko <mhocko@...nel.org>, Roman Gushchin
 <roman.gushchin@...ux.dev>, Shakeel Butt <shakeel.butt@...ux.dev>, Muchun
 Song <muchun.song@...ux.dev>, Andrew Morton <akpm@...ux-foundation.org>,
 cgroups@...r.kernel.org, linux-mm@...ck.org, linux-kernel@...r.kernel.org,
 kernel-team@...a.com, Nhat Pham <nphamcs@...il.com>, Yosry Ahmed
 <yosryahmed@...gle.com>
Subject: [PATCH] memcg: allow exiting tasks to write back data to swap

A task already in exit can get stuck trying to allocate pages, if its
cgroup is at the memory.max limit, the cgroup is using zswap, but
zswap writeback is enabled, and the remaining memory in the cgroup is
not compressible.

This seems like an unlikely confluence of events, but it can happen
quite easily if a cgroup is OOM killed due to exceeding its memory.max
limit, and all the tasks in the cgroup are trying to exit simultaneously.

When this happens, it can sometimes take hours for tasks to exit,
as they are all trying to squeeze things into zswap to bring the group's
memory consumption below memory.max.

Allowing these exiting programs to push some memory from their own
cgroup into swap allows them to quickly bring the cgroup's memory
consumption below memory.max, and exit in seconds rather than hours.

Loading this fix as a live patch on a system where a workload got stuck
exiting allowed the workload to exit within a fraction of a second.

Signed-off-by: Rik van Riel <riel@...riel.com>
---
 mm/memcontrol.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 7b3503d12aaf..03d77e93087e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5371,6 +5371,15 @@ bool mem_cgroup_zswap_writeback_enabled(struct mem_cgroup *memcg)
 	if (!zswap_is_enabled())
 		return true;
 
+	/*
+	 * Always allow exiting tasks to push data to swap. A process in
+	 * the middle of exit cannot get OOM killed, but may need to push
+	 * uncompressible data to swap in order to get the cgroup memory
+	 * use below the limit, and make progress with the exit.
+	 */
+	if ((current->flags & PF_EXITING) && memcg == mem_cgroup_from_task(current))
+		return true;
+
 	for (; memcg; memcg = parent_mem_cgroup(memcg))
 		if (!READ_ONCE(memcg->zswap_writeback))
 			return false;
-- 
2.47.0



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ