linux-kernel - Re: [RFC 1/3] oom, sysrq: Skip over oom victims and killed tasks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.10.1601201550060.18155@chino.kir.corp.google.com>
Date:	Wed, 20 Jan 2016 16:01:54 -0800 (PST)
From:	David Rientjes <rientjes@...gle.com>
To:	Michal Hocko <mhocko@...nel.org>
cc:	linux-mm@...ck.org,
	Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RFC 1/3] oom, sysrq: Skip over oom victims and killed tasks

On Wed, 20 Jan 2016, Michal Hocko wrote:

> No, I do not have a specific load in mind. But let's be realistic. There
> will _always_ be corner cases where the VM cannot react properly or in a
> timely fashion.
> 

Then let's identify it and fix it, like we do with any other bug?  I'm 99% 
certain you are not advocating that human intervention is the ideal 
solution to prevent lengthy stalls or livelocks.

I can't speak for all possible configurations and workloads; the only 
thing we use sysrq+f for is automated testing of the oom killer itself.  
It would help to know of any situations when people actually need to use 
this to solve issues and then fix those issues rather than insisting that 
this is the ideal solution.

> To be honest I really fail to understand your line of argumentation
> here. Just that you think that sysrq+f might be not helpful in large
> datacenters which you seem to care about, doesn't mean that it is not
> helpful in other setups.
> 

This type of message isn't really contributing anything.  You don't have a 
specific load in mind, you can't identify a pending bug that people have 
complained about, you presumably can't show a testcase that demonstrates 
how it's required, yet you're arguing that we should keep a debugging tool 
around because you think somebody somewhere sometime might use it.

 [ I would imagine that users would be unhappy they have to kill processes 
   already, and would have reported how ridiculous it is that they had to
   use sysrq+f, but I haven't seen those bug reports. ]

I want the VM to be responsive, I don't want it to thrash forever, and I 
want it to not require root to trigger a sysrq to have the kernel kill a 
process for the VM to work properly.  We either need to fix the issue that 
causes the unresponsiveness or oom kill processes earlier.  This is very 
simple.