lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160120094938.GB14187@dhcp22.suse.cz>
Date:	Wed, 20 Jan 2016 10:49:38 +0100
From:	Michal Hocko <mhocko@...nel.org>
To:	David Rientjes <rientjes@...gle.com>
Cc:	linux-mm@...ck.org,
	Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RFC 1/3] oom, sysrq: Skip over oom victims and killed tasks

On Tue 19-01-16 14:57:33, David Rientjes wrote:
> On Fri, 15 Jan 2016, Michal Hocko wrote:
> 
> > > I think it's time to kill sysrq+F and I'll send those two patches
> > > unless there is a usecase I'm not aware of.
> > 
> > I have described one in the part you haven't quoted here. Let me repeat:
> > : Your system might be trashing to the point you are not able to log in
> > : and resolve the situation in a reasonable time yet you are still not
> > : OOM. sysrq+f is your only choice then.
> > 
> > Could you clarify why it is better to ditch a potentially usefull
> > emergency tool rather than to make it work reliably and predictably?
> 
> I'm concerned about your usecase where the kernel requires admin 
> intervention to resolve such an issue and there is nothing in the VM we 
> can do to fix it.
> 
> If you have a specific test that demonstrates when your usecase is needed, 
> please provide it so we can address the issue that it triggers.

No, I do not have a specific load in mind. But let's be realistic. There
will _always_ be corner cases where the VM cannot react properly or in a
timely fashion.

> I'd prefer to fix the issue in the VM rather than require human
> intervention, especially when we try to keep a very large number of
> machines running in our datacenters.

It is always preferable to resolve the mm related issue automagically,
of course. We should strive for robustness as much as possible but that
doesn't mean we should get the only emergency tool out of administrator
hands.

To be honest I really fail to understand your line of argumentation
here. Just that you think that sysrq+f might be not helpful in large
datacenters which you seem to care about, doesn't mean that it is not
helpful in other setups.

Removing the functionality is out of question IMHO so can we please
start discussing how to make it more predictable please?
-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ