lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 17 Feb 2016 14:32:42 +0100
From:	Michal Hocko <mhocko@...nel.org>
To:	Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
Cc:	akpm@...ux-foundation.org, rientjes@...gle.com, mgorman@...e.de,
	oleg@...hat.com, torvalds@...ux-foundation.org, hughd@...gle.com,
	andrea@...nel.org, riel@...hat.com, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 6/6] mm,oom: wait for OOM victims when using
 oom_kill_allocating_task == 1

On Wed 17-02-16 19:36:36, Tetsuo Handa wrote:
> >From 0b36864d4100ecbdcaa2fc2d1927c9e270f1b629 Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
> Date: Wed, 17 Feb 2016 16:37:59 +0900
> Subject: [PATCH 6/6] mm,oom: wait for OOM victims when using oom_kill_allocating_task == 1
> 
> Currently, out_of_memory() does not wait for existing TIF_MEMDIE threads
> if /proc/sys/vm/oom_kill_allocating_task is set to 1. This can result in
> killing more OOM victims than needed. We can wait for the OOM reaper to
> reap memory used by existing TIF_MEMDIE threads if possible. If the OOM
> reaper is not available, the system will be kept OOM stalled until an
> OOM-unkillable thread does a GFP_FS allocation request and calls
> oom_kill_allocating_task == 0 path.
> 
> This patch changes oom_kill_allocating_task == 1 case to call
> select_bad_process() in order to wait for existing TIF_MEMDIE threads.

The primary motivation for oom_kill_allocating_task was to reduce the
overhead of select_bad_process. See fe071d7e8aae ("oom: add
oom_kill_allocating_task sysctl"). So this basically defeats the whole
purpose of the feature.

I am not user of this knob because it behaves absolutely randomly but
IMHO we should simply do something like the following. It would be more
compliant to the documentation and prevent from livelock which is
currently possible (albeit very unlikely) when a single task consimes
all the memory reserves and we keep looping over out_of_memory without
any progress.

But as I've said I have no idea whether somebody relies on the current
behavior so this is more of a thinking loudly than proposing an actual
patch at this point of time.
---
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 078e07ec0906..7de84fb2dd03 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -706,6 +706,9 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
 	pr_err("%s: Kill process %d (%s) score %u or sacrifice child\n",
 		message, task_pid_nr(p), p->comm, points);
 
+	if (sysctl_oom_kill_allocating_task)
+		goto kill;
+
 	/*
 	 * If any of p's children has a different mm and is eligible for kill,
 	 * the one with the highest oom_badness() score is sacrificed for its
@@ -734,6 +737,7 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
 	}
 	read_unlock(&tasklist_lock);
 
+kill:
 	p = find_lock_task_mm(victim);
 	if (!p) {
 		put_task_struct(victim);
@@ -888,6 +892,9 @@ bool out_of_memory(struct oom_control *oc)
 	if (sysctl_oom_kill_allocating_task && current->mm &&
 	    !oom_unkillable_task(current, NULL, oc->nodemask) &&
 	    current->signal->oom_score_adj != OOM_SCORE_ADJ_MIN) {
+		if (test_thread_flag(TIF_MEMDIE))
+			panic("Out of memory (oom_kill_allocating_task) not able to make a forward progress");
+
 		get_task_struct(current);
 		oom_kill_process(oc, current, 0, totalpages, NULL,
 				 "Out of memory (oom_kill_allocating_task)");
-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ