[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <201602171929.IFG12927.OVFJOQHOSMtFFL@I-love.SAKURA.ne.jp>
Date: Wed, 17 Feb 2016 19:29:33 +0900
From: Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To: mhocko@...nel.org, akpm@...ux-foundation.org
Cc: rientjes@...gle.com, mgorman@...e.de, oleg@...hat.com,
torvalds@...ux-foundation.org, hughd@...gle.com, andrea@...nel.org,
riel@...hat.com, linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: [PATCH 1/6] mm,oom: exclude TIF_MEMDIE processes from candidates.
>>From 142b08258e4c60834602e9b0a734564208bc6397 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
Date: Wed, 17 Feb 2016 16:29:29 +0900
Subject: [PATCH 1/6] mm,oom: exclude TIF_MEMDIE processes from candidates.
The OOM reaper kernel thread can reclaim OOM victim's memory before
the victim releases it. But it is possible that a TIF_MEMDIE thread
gets stuck at down_read(&mm->mmap_sem) in exit_mm() called from
do_exit() due to one of !TIF_MEMDIE threads doing a GFP_KERNEL
allocation between down_write(&mm->mmap_sem) and up_write(&mm->mmap_sem)
(e.g. mmap()). In that case, we need to use SysRq-f (manual invocation
of the OOM killer) because down_read_trylock(&mm->mmap_sem) by the OOM
reaper will not succeed. Also, there are other situations where the OOM
reaper cannot reap the victim's memory (e.g. CONFIG_MMU=n, victim's
memory is shared with OOM-unkillable processes) which will require
manual SysRq-f for making progress.
However, it is possible that the OOM killer chooses the same OOM victim
forever which already has TIF_MEMDIE. This is effectively disabling
SysRq-f. This patch excludes processes which has a TIF_MEMDIE thread
from OOM victim candidates.
Signed-off-by: Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
---
mm/oom_kill.c | 30 +++++++++++++++++++++++++++---
1 file changed, 27 insertions(+), 3 deletions(-)
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 871470f..27949ef 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -119,6 +119,30 @@ found:
}
/*
+ * Treat the whole process p as unkillable when one of threads has
+ * TIF_MEMDIE pending. Otherwise, we may end up setting TIF_MEMDIE
+ * on the same victim forever (e.g. making SysRq-f unusable).
+ */
+static struct task_struct *find_lock_non_victim_task_mm(struct task_struct *p)
+{
+ struct task_struct *t;
+
+ rcu_read_lock();
+
+ for_each_thread(p, t) {
+ if (likely(!test_tsk_thread_flag(t, TIF_MEMDIE)))
+ continue;
+ t = NULL;
+ goto found;
+ }
+ t = find_lock_task_mm(p);
+ found:
+ rcu_read_unlock();
+
+ return t;
+}
+
+/*
* order == -1 means the oom kill is required by sysrq, otherwise only
* for display purposes.
*/
@@ -165,7 +189,7 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
if (oom_unkillable_task(p, memcg, nodemask))
return 0;
- p = find_lock_task_mm(p);
+ p = find_lock_non_victim_task_mm(p);
if (!p)
return 0;
@@ -361,7 +385,7 @@ static void dump_tasks(struct mem_cgroup *memcg, const nodemask_t *nodemask)
if (oom_unkillable_task(p, memcg, nodemask))
continue;
- task = find_lock_task_mm(p);
+ task = find_lock_non_victim_task_mm(p);
if (!task) {
/*
* This is a kthread or all of p's threads have already
@@ -562,7 +586,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
}
read_unlock(&tasklist_lock);
- p = find_lock_task_mm(victim);
+ p = find_lock_non_victim_task_mm(victim);
if (!p) {
put_task_struct(victim);
return;
--
1.8.3.1
Powered by blists - more mailing lists