linux-kernel - [RFC 3/3] memcg,oom: do not check PF_EXITING and do not set TIF

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1389798068-19885-4-git-send-email-mhocko@suse.cz>
Date:	Wed, 15 Jan 2014 16:01:08 +0100
From:	Michal Hocko <mhocko@...e.cz>
To:	<linux-mm@...ck.org>
Cc:	LKML <linux-kernel@...r.kernel.org>,
	Johannes Weiner <hannes@...xchg.org>,
	David Rientjes <rientjes@...gle.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: [RFC 3/3] memcg,oom: do not check PF_EXITING and do not set TIF_MEMDIE

Memcg OOM handler mimics the global OOM handler heuristics. One of them
is to give a dying task (one with either fatal signals pending or
PF_EXITING set) access to memory reserves via TIF_MEMDIE flag. This is
not necessary though, because memory allocation has been already done
when it is charged against a memcg so we do not need to abuse the flag.

fatal_signal_pending check is a bit tricky because the current task might
have been killed during reclaim as an action done by vmpressure/thresholds
handlers and we would definitely want to prevent from OOM kill in such
situations.
The current check is incomplete, though, because it only works for
the current task because oom_scan_process_thread doesn't check for
fatal_signal_pending. oom_scan_process_thread is shared between
global and memcg OOM killer so we cannot simply abort scanning
for killed tasks. We can, instead, move the check downwards in
mem_cgroup_out_of_memory and break out from the tasks iteration loop
when a killed task is encountered. We could check for PF_EXITING as well
but it is dubious whether this would be helpful much more as a task
should exit quite quickly once it is scheduled.

Signed-off-by: Michal Hocko <mhocko@...e.cz>
---
 mm/memcontrol.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 97ae5cf12f5e..ea9564895f54 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -1761,16 +1761,6 @@ static void mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
 	unsigned int points = 0;
 	struct task_struct *chosen = NULL;

-	/*
-	 * If current has a pending SIGKILL or is exiting, then automatically
-	 * select it.  The goal is to allow it to allocate so that it may
-	 * quickly exit and free its memory.
-	 */
-	if (fatal_signal_pending(current)) {
-		set_thread_flag(TIF_MEMDIE);
-		return;
-	}
-
 	check_panic_on_oom(CONSTRAINT_MEMCG, gfp_mask, order, NULL);
 	totalpages = mem_cgroup_get_limit(memcg) >> PAGE_SHIFT ? : 1;
 	for_each_mem_cgroup_tree(iter, memcg) {
@@ -1779,6 +1769,16 @@ static void mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,

 		css_task_iter_start(&iter->css, &it);
 		while ((task = css_task_iter_next(&it))) {
+			/*
+			 * Killed tasks are selected automatically. The goal is
+			 * to give the task some more time to exit and release
+			 * the memory.
+			 * Unlike for the global OOM handler we do not need
+			 * access to memory reserves.
+			 */
+			if (fatal_signal_pending(task))
+				goto abort;
+
 			switch (oom_scan_process_thread(task, totalpages, NULL,
 							false)) {
 			case OOM_SCAN_SELECT:
@@ -1791,6 +1791,7 @@ static void mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
 			case OOM_SCAN_CONTINUE:
 				continue;
 			case OOM_SCAN_ABORT:
+abort:
 				css_task_iter_end(&it);
 				mem_cgroup_iter_break(memcg, iter);
 				if (chosen)
-- 
1.8.5.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/