[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <201002111050.36709.l.lunak@suse.cz>
Date: Thu, 11 Feb 2010 10:50:36 +0100
From: Lubos Lunak <l.lunak@...e.cz>
To: Alan Cox <alan@...rguk.ukuu.org.uk>
Cc: Rik van Riel <riel@...hat.com>,
David Rientjes <rientjes@...gle.com>,
Balbir Singh <balbir@...ux.vnet.ibm.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org,
Andrew Morton <akpm@...ux-foundation.org>,
KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
Nick Piggin <npiggin@...e.de>, Jiri Kosina <jkosina@...e.cz>
Subject: Re: Improving OOM killer
On Wednesday 10 of February 2010, Alan Cox wrote:
> > Killing the system daemon *is* a DoS.
> >
> > It would stop eg. the database or the web server, which is
> > generally the main task of systems that run a database or
> > a web server.
>
> One of the problems with picking on tasks that fork a lot is that
> describes apache perfectly. So a high loaded apache will get shot over a
> rapid memory eating cgi script.
It will not. If it's only a single cgi script, that that child should be
selected by badness(), not the parent.
I personally consider the logic of trying to find the offender using
badness() and then killing its child instead to be flawed. Already badness()
itself should select what to kill and that should be killed. If it's a single
process that is the offender, it should be killed. If badness() decides it is
a whole subtree responsible for the situation, then the top of it needs to be
killed, otherwise the reason for the problem will remain.
I expect the current logic of trying to kill children first is based on the
system daemon logic, but if e.g. Apache master process itself causes OOM,
then the kernel itself has to way to find out if it's an important process
that should be protected or if it's some random process causing a forkbomb.
>From the kernel point's of view, if the Apache master process caused the
problem, the the problem should be solved there. If the reason for the
problem was actually e.g. a temporary high load on the server, then Apache is
probably misconfigured, and if it really should stay running no matter what,
then I guess that's the case to use oom_adj. But otherwise, from OOM killer's
point of view, that is where the problem was.
Of course, the algorithm used in badness() should be careful not to propagate
the excessive memory usage in that case to the innocent parent. This problem
existed in the current code until it was fixed by the "/2" recently, and at
least my current proposal actually suffers from it too. But I envision
something like this could handle it nicely (pseudocode):
int oom_children_memory_usage(task)
{
// Memory shared with the parent should not be counted again.
// Since it's expensive to find that out exactly, just assume
// that the amount of shared memory that is not shared with the parent
// is insignificant.
total = unshared_rss(task)+unshared_swap(task);
foreach_child(child,task)
total += oom_children_memory_usage(child);
return total;
}
int badness(task)
{
int total_memory = 0;
...
int max_child_memory = 0; // memory used by that child
int max_child_memory_2 = 0; // the 2nd most memory used by a child
foreach_child(child,task)
{
if(sharing_the_same_memory(child,task))
continue;
if( real_time(child) > 1minute )
continue; // running long, not a forkbomb
int memory = oom_children_memory_usage(task);
total_memory += memory;
if( memory > max_child_memory )
{
max_child_memory_2 = max_child_memory;
max_child_memory = memory;
}
else if( memory > max_child_memory_2 )
max_child_memory_2 = memory;
}
if( max_child_memory_2 != 0 ) // there were at least two children
{
if( max_child_memory > max_child_memory_2 / 2 )
{
// There is only a single child that contributes the majority of memory
// used by all children. Do not add it to the total, so that if that process
// is the biggest offender, the killer picks it instead of this parent.
total_memory -= max_child_memory;
}
}
...
}
The logic is simply that a process is responsible for its children only if
their cost is similar. If one of them stands out, it is responsible for
itself and the parent is not. This is intentionally not done recursively in
oom_children_memory_usage() to cover also the case when e.g. parallel make
runs too many processes wrapped by shell, in that case making any of those
shell instances responsible for its child doesn't help anything, but making
make responsible for all of them helps.
Alternatively, if somebody has a good use case where first going after a
child may make sense, then it perhaps would help to
add 'oom_recently_killed_children' to each task, and increasing it whenever a
child is killed instead of the responsible parent. As soon as the value
within a reasonably short time is higher than let's say 5, then apparently
killing children does not help and the mastermind has to go.
--
Lubos Lunak
openSUSE Boosters team, KDE developer
l.lunak@...e.cz , l.lunak@....org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists