linux-kernel - Re: [patch 4/7 -mm] oom: badness heuristic rewrite

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.00.1002121251130.7972@chino.kir.corp.google.com>
Date:	Fri, 12 Feb 2010 13:00:10 -0800 (PST)
From:	David Rientjes <rientjes@...gle.com>
To:	Minchan Kim <minchan.kim@...il.com>
cc:	Rik van Riel <riel@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Nick Piggin <npiggin@...e.de>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Balbir Singh <balbir@...ux.vnet.ibm.com>,
	Lubos Lunak <l.lunak@...e.cz>, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org
Subject: Re: [patch 4/7 -mm] oom: badness heuristic rewrite

On Fri, 12 Feb 2010, Minchan Kim wrote:

> > True, that's a great example of why child tasks should be sacrificed for 
> > the parent: if the oom killer is being called then we are truly overloaded 
> > and there's no shame in killing excessive client connections to recover, 
> > otherwise we might find the entire server becoming unresponsive.  The user 
> > can easily tune to /proc/sys/vm/oom_forkbomb_thres to define what 
> > "excessive" is to assess the penalty, if any.  I'll add that to the 
> > comment if we require a second revision.
> > 
> 
> I am worried about opposite case.
> 
> If forkbomb parent makes so many children in a short time(ex, 2000 per
> second) continuously and we kill a child continuously not parent, system
> is almost unresponsible, I think.  

The oom killer is not the appropriate place for a kernel forkbomb policy 
to be implemented, you'd need to address that concern in the scheduler.  
When I've brought that up in the past, the response is that if we aren't 
out of memory, then it isn't a problem.  It is a problem for buggy 
applications because their timeslice is now spread across an egregious 
amount of tasks that they are perhaps leaking and is detrimental to their 
server's performance.  I'm not saying that we need to enforce a hard limit 
on how many tasks a server forks, for instance, but the scheduler can 
detect forkbombs much easier than the oom killer's tasklist scan by at 
least indicating to us with a process flag that it is a likely forkbomb.

> I suffered from that case in LTP and no swap system.
> It might be a corner case but might happen in real. 
> 

If you look at the patchset overall and not just this one patch, you'll 
notice that we now kill the child with the highest badness() score first, 
i.e. generally the one consuming the most memory.  That is radically 
different than the previous behavior and should prevent the system from 
becoming unresponsive.  The goal is to allow the user to react to the 
forkbomb rather than implement a strict detection and handling heuristic 
that kills innocent servers and system daemons.

> If we make sure this task is buggy forkbomb, it would be better to kill
> it. But it's hard to make sure it's a buggy forkbomb.
> 
> Could we solve this problem by following as?
> If OOM selects victim and then the one was selected victim right before
> and it's repeatable 5 times for example, then we kill the victim(buggy
> forkbom) itself not child of one. It is assumed normal forkbomb is
> controlled by admin who uses oom_forkbomb_thres well. So it doesn't
> happen selecting victim continuously above five time.
> 

That doesn't work with Rik's example of a webserver that forks a large 
number of threads to handle client connections.  It is _always_ better to 
kill a child instead of making the entire webserver unresponsive.

In other words, doing anything in the oom killer other than slightly 
penalizing these tasks and killing a child is really a non-starter because 
there are too many critical use cases (we have many) that would be 
unfairly biased against.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/