linux-kernel - Re: [PATCH] mm,oom: Re-enable OOM killer using timers.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.10.1601141500370.22665@chino.kir.corp.google.com>
Date:	Thu, 14 Jan 2016 15:09:56 -0800 (PST)
From:	David Rientjes <rientjes@...gle.com>
To:	Johannes Weiner <hannes@...xchg.org>
cc:	Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
	mhocko@...nel.org, Andrew Morton <akpm@...ux-foundation.org>,
	mgorman@...e.de, torvalds@...ux-foundation.org, oleg@...hat.com,
	hughd@...gle.com, andrea@...nel.org, riel@...hat.com,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm,oom: Re-enable OOM killer using timers.

On Thu, 14 Jan 2016, Johannes Weiner wrote:

> > This is where me and you disagree; the goal should not be to continue to 
> > oom kill more and more processes since there is no guarantee that further 
> > kills will result in forward progress.  These additional kills can result 
> > in the same livelock that is already problematic, and killing additional 
> > processes has made the situation worse since memory reserves are more 
> > depleted.
> > 
> > I believe what is better is to exhaust reclaim, check if the page 
> > allocator is constantly looping due to waiting for the same victim to 
> > exit, and then allowing that allocation with memory reserves, see the 
> > attached patch which I have proposed before.
> 
> If giving the reserves to another OOM victim is bad, how is giving
> them to the *allocating* task supposed to be better?

Unfortunately, due to rss and oom priority, it is possible to repeatedly 
select processes which are all waiting for the same mutex.  This is 
possible when loading shards, for example, and all processes have the same 
oom priority and are livelocked on i_mutex which is the most common 
occurrence in our environments.  The livelock came about because we 
selected a process that could not make forward progress, there is no 
guarantee that we will not continue to select such processes.

Giving access to the memory allocator eventually allows all allocators to 
successfully allocate, giving the holder of i_mutex the ability to 
eventually drop it.  This happens in a very rate-limited manner depending 
on how you define when the page allocator has looped enough waiting for 
the same process to exit in my patch.

In the past, we have even increased the scheduling priority of oom killed 
processes so that they have a greater likelihood of picking up i_mutex and 
exiting.

> We need to make the OOM killer conclude in a fixed amount of time, no
> matter what happens. If the system is irrecoverably deadlocked on
> memory it needs to panic (and reboot) so we can get on with it. And
> it's silly to panic while there are still killable tasks available.
> 

What is the solution when there are no additional processes that may be 
killed?  It is better to give access to memory reserves so a single 
stalling allocation can succeed so the livelock can be resolved rather 
than panicking.