linux-kernel - Re: [patch 1/2] mm, memcg: avoid oom notification when current needs access to memory reserves

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20131212121140.GD2630@dhcp22.suse.cz>
Date:	Thu, 12 Dec 2013 13:11:40 +0100
From:	Michal Hocko <mhocko@...e.cz>
To:	David Rientjes <rientjes@...gle.com>
Cc:	Johannes Weiner <hannes@...xchg.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	cgroups@...r.kernel.org
Subject: Re: [patch 1/2] mm, memcg: avoid oom notification when current needs
 access to memory reserves

On Thu 12-12-13 11:31:59, Michal Hocko wrote:
[...]
> The semantic would be as simple as "notification is sent only when
> an action is due". It will be still racy as nothing prevents a task
> which is not under OOM to exit and release some memory but there is no
> sensible way to address that. On the other hand such a semantic would be
> sensible for oom_control listeners because they will know that an action
> has to be or will be taken (the line was drawn).
> 
> Can we agree on this, Johannes? Or you see the line drawn when
> mem_cgroup_oom_synchronize has been reached already no matter whether
> the action is to be done or not?

Something like the following:

>From 5d9c01e2814a7ade49db7945ad3890f4f138855e Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@...e.cz>
Date: Thu, 12 Dec 2013 11:50:17 +0100
Subject: [PATCH] memcg: notify userspace about OOM when and action is due

Userspace is currently notified about OOM condition after fails
to reclaim any memory after MEM_CGROUP_RECLAIM_RETRIES rounds.
This usually means that the memcg is really in troubles and an
OOM action (either done by userspace or kernel) has to be taken.
The kernel OOM killer however bails out and doesn't kill anything
if it sees an already dying/exiting task in a good hope a memory
will be released and OOM situation will be resolved.

Therefore it makes sense to notify userspace only after really all
measures have been taken and an userspace action is required or
the kernel kills a task.

This patch also removes fatal_signal_pending and PF_EXITING check from
mem_cgroup_oom_synchronize because __mem_cgroup_try_charge already
checks for both and bypasses charge so we cannot end up in the oom path.

Signed-off-by: Michal Hocko <mhocko@...e.cz>
---
 mm/memcontrol.c | 17 ++++-------------
 mm/oom_kill.c   |  5 +++++
 2 files changed, 9 insertions(+), 13 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 98900c070045..af7148c77bac 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2235,16 +2235,6 @@ bool mem_cgroup_oom_synchronize(bool handle)
 	if (!handle)
 		goto cleanup;
 
-	/*
-	 * If current has a pending SIGKILL or is exiting, then automatically
-	 * select it.  The goal is to allow it to allocate so that it may
-	 * quickly exit and free its memory.
-	 */
-	if (fatal_signal_pending(current) || current->flags & PF_EXITING) {
-		set_thread_flag(TIF_MEMDIE);
-		goto cleanup;
-	}
-
 	owait.memcg = memcg;
 	owait.wait.flags = 0;
 	owait.wait.func = memcg_oom_wake_function;
@@ -2256,15 +2246,16 @@ bool mem_cgroup_oom_synchronize(bool handle)
 
 	locked = mem_cgroup_oom_trylock(memcg);
 
-	if (locked)
-		mem_cgroup_oom_notify(memcg);
-
 	if (locked && !memcg->oom_kill_disable) {
 		mem_cgroup_unmark_under_oom(memcg);
 		finish_wait(&memcg_oom_waitq, &owait.wait);
+		/* calls mem_cgroup_oom_notify if there is a task to kill */
 		mem_cgroup_out_of_memory(memcg, current->memcg_oom.gfp_mask,
 					 current->memcg_oom.order);
 	} else {
+		if (locked && memcg->oom_kill_disable)
+			mem_cgroup_oom_notify(memcg);
+
 		schedule();
 		mem_cgroup_unmark_under_oom(memcg);
 		finish_wait(&memcg_oom_waitq, &owait.wait);
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 1e4a600a6163..47c9de8da36d 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -394,6 +394,8 @@ static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
 		dump_tasks(memcg, nodemask);
 }
 
+extern void mem_cgroup_oom_notify(struct mem_cgroup *memcg);
+
 #define K(x) ((x) << (PAGE_SHIFT-10))
 /*
  * Must be called while holding a reference to p, which will be released upon
@@ -470,6 +472,9 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 		victim = p;
 	}
 
+	if (memcg)
+		mem_cgroup_oom_notify(memcg);
+
 	/* mm cannot safely be dereferenced after task_unlock(victim) */
 	mm = victim->mm;
 	pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB\n",
-- 
1.8.4.4

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/