linux-kernel - Re: [PATCH 2/3] oom, oom_reaper: Try to reap tasks which skip regular OOM killer path

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160411134321.GI23157@dhcp22.suse.cz>
Date:	Mon, 11 Apr 2016 15:43:21 +0200
From:	Michal Hocko <mhocko@...nel.org>
To:	Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
Cc:	linux-mm@...ck.org, rientjes@...gle.com, akpm@...ux-foundation.org,
	linux-kernel@...r.kernel.org, oleg@...hat.com
Subject: Re: [PATCH 2/3] oom, oom_reaper: Try to reap tasks which skip
 regular OOM killer path

On Mon 11-04-16 22:26:09, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > On Sat 09-04-16 13:39:30, Tetsuo Handa wrote:
> > > Michal Hocko wrote:
> > > > On Fri 08-04-16 20:19:28, Tetsuo Handa wrote:
> > > > > I looked at next-20160408 but I again came to think that we should remove
> > > > > these shortcuts (something like a patch shown bottom).
> > > >
> > > > feel free to send the patch with the full description. But I would
> > > > really encourage you to check the history to learn why those have been
> > > > added and describe why those concerns are not valid/important anymore.
> > > 
> > > I believe that past discussions and decisions about current code are too
> > > optimistic because they did not take 'The "too small to fail" memory-
> > > allocation rule' problem into account.
> > 
> > In most cases they were driven by _real_ usecases though. And that
> > is what matters. Theoretically possible issues which happen under
> > crazy workloads which are DoSing the machine already are not something
> > to optimize for. Sure we should try to cope with them as gracefully
> > as possible, no questions about that, but we should try hard not to
> > reintroduce previous issues during _sensible_ workloads.
> 
> I'm not requesting you to optimize for crazy workloads. None of my
> customers intentionally put crazy workloads, but they are getting silent
> hangups and I'm suspecting that something went wrong with memory management.

There are many other possible reasons for thses symptoms. Have you
actually seen any _evidence_ they the hang they are seeing is due to
oom deadlock, though. A single crash dump or consistent sysrq output
which would point that direction.

> But there is no evidence because memory management subsystem remains silent.
> You call my testcases DoS, but there is no evidence that my customers
> are not hitting the same problem my testcases found.

This is really impossible to comment on.

> I'm suggesting you to at least emit diagnostic messages when something went
> wrong. That is what kmallocwd is for. And if you do not want to emit
> diagnostic messages, I'm fine with timeout based approach.

I am all for more diagnostic but what you were proposing was so heavy
weight it doesn't really seem worth it.

Anyway yet again this is getting largely off-topic...
-- 
Michal Hocko
SUSE Labs