lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 21 Sep 2015 16:27:31 -0700 (PDT)
From:	David Rientjes <rientjes@...gle.com>
To:	Christoph Lameter <cl@...ux.com>
cc:	Oleg Nesterov <oleg@...hat.com>, Kyle Walker <kwalker@...hat.com>,
	akpm@...ux-foundation.org, mhocko@...e.cz, hannes@...xchg.org,
	vdavydov@...allels.com, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org,
	Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
	Stanislav Kozina <skozina@...hat.com>
Subject: Re: [PATCH] mm/oom_kill.c: don't kill TASK_UNINTERRUPTIBLE tasks

On Fri, 18 Sep 2015, Christoph Lameter wrote:

> Subject: Allow multiple kills from the OOM killer
> 
> The OOM killer currently aborts if it finds a process that already is having
> access to the reserve memory pool for exit processing. This is done so that
> the reserves are not overcommitted but on the other hand this also allows
> only one process being oom killed at the time. That process may be stuck
> in D state.
> 
> Signed-off-by: Christoph Lameter <cl@...ux.com>
> 
> Index: linux/mm/oom_kill.c
> ===================================================================
> --- linux.orig/mm/oom_kill.c	2015-09-18 11:58:52.963946782 -0500
> +++ linux/mm/oom_kill.c	2015-09-18 11:59:42.010684778 -0500
> @@ -264,10 +264,9 @@ enum oom_scan_t oom_scan_process_thread(
>  	 * This task already has access to memory reserves and is being killed.
>  	 * Don't allow any other task to have access to the reserves.
>  	 */
> -	if (test_tsk_thread_flag(task, TIF_MEMDIE)) {
> -		if (oc->order != -1)
> -			return OOM_SCAN_ABORT;
> -	}
> +	if (test_tsk_thread_flag(task, TIF_MEMDIE))
> +		return OOM_SCAN_CONTINUE;
> +
>  	if (!task->mm)
>  		return OOM_SCAN_CONTINUE;
> 

If this would result in the newly chosen process being guaranteed to exit, 
this would be fine.  Unfortunately, no such guarantee is possible.  If a 
thread is holding a contended mutex that the victim(s) require, this 
serial oom killer could eventually panic the system if that thread is 
OOM_DISABLE.

The solution that we have merged internally is described at 
http://marc.info/?l=linux-kernel&m=144010444913702 -- we provide access to 
memory reserves to processes that find a stalled exit in the oom killer so 
that they may allocate.  It comes along with a test module that takes a 
contended mutex and ensures that forward progress is made as long as 
memory reserves are not depleted.  We can't actually guarantee that memory 
reserves won't be depleted, but we (1) hope that nobody is actually 
allocating a lot of memory before dropping a mutex and (2) want to avoid 
the alternative which is a system livelock.

This will address situations such as

	allocator			oom victim
	---------			----------
	mutex_lock(lock)
	alloc_pages(GFP_KERNEL)
					mutex_lock(lock)
					mutex_unlock(lock)
					handle SIGKILL

since this otherwise results in a livelock without a solution such as 
mine since the GFP_KERNEL allocation stalls forever waiting for the oom 
victim to acquire the mutex and exit.  This also works if the allocator is 
OOM_DISABLE.

This won't handle other situations where the victim gets wedged in D state 
and is not allocating memory, but this is by far the more common 
occurrence that we have dealt with.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ