linux-kernel - Re: Improving OOM killer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <201002111116.07211.l.lunak@suse.cz>
Date:	Thu, 11 Feb 2010 11:16:07 +0100
From:	Lubos Lunak <l.lunak@...e.cz>
To:	David Rientjes <rientjes@...gle.com>
Cc:	Balbir Singh <balbir@...ux.vnet.ibm.com>,
	Rik van Riel <riel@...hat.com>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Nick Piggin <npiggin@...e.de>, Jiri Kosina <jkosina@...e.cz>
Subject: Re: Improving OOM killer

On Wednesday 10 of February 2010, David Rientjes wrote:
> On Wed, 10 Feb 2010, Lubos Lunak wrote:
> >  Which is why I suggested summing up the memory of the parent and its
> > children.
>
> That's almost identical to the current heuristic where we sum half the
> size of the children's VM size, unfortunately it's not a good indicator of
> forkbombs since in your particular example it would be detrimental to
> kdeinit.

 I believe that with the algorithm no longer using VmSize and being careful 
not to count shared memory more than once this would not be an issue and 
kdeinit would be reasonably safe. KDE does not use _that_ much memory to 
score higher than something that caused OOM :).

> My heursitic considers runtime of the children as an indicator 
> of a forkbombing parent since such tasks don't typically get to run
> anyway.  The rss or swap usage of a child with a seperate address space
> simply isn't relevant to the badness score of the parent, it unfairly
> penalizes medium/large server jobs.

 Our definitions of 'forkbomb' then perhaps differ a bit. I 
consider 'make -j100' a kind of a forkbomb too, it will very likely overload 
the machine too as soon as the gcc instances use up all the memory. For that 
reason also using CPU time <1second will not work here, while using real time 
<1minute would.

 That long timeout would have the weakness that when running at the same time 
reasonable 'make -j4' and Firefox that'd immediatelly go crazy, then maybe 
the make job could be targeted instead if its total cost would go higher. 
However, here I again believe that the fixed metrics for computing memory 
usage would work well enough to let that happen only when the total cost of 
the make job would be actually higher than that of the offender and in that 
case it is kind of an offender too.

 Your protection seems to cover only "for(;;) if(fork() == 0) break;" , while 
I believe mine could handle also "make -j100" or the bash forkbomb ":()
{ :|:& };:" (i.e. "for(;;) fork();").

> > > We can't address recursive forkbombing in the oom killer with any
> > > efficiency, but luckily those cases aren't very common.
> >
> >  Right, I've never run a recursive make that brought my machine to its
> > knees. Oh, wait.
>
> That's completely outside the scope of the oom killer, though: it is _not_
> the oom killer's responsibility for enforcing a kernel-wide forkbomb
> policy

 Why? It repeatedly causes OOM here (and in fact it is the only common OOM or 
forkbomb I ever encounter). If OOM killer is the right place to protect 
against a forkbomb that spawns a large number of 1st level children, then I 
don't see how this is different.

> >  And why exactly is iterating over 1st level children efficient enough
> > and doing that recursively is not? I don't find it significantly more
> > expensive and badness() is hardly a bottleneck anyway.
>
> If we look at children's memory usage recursively, then we'll always end
> up selecting init_task.

 Not if the algorithm does not propagate the top of the problematic subtree 
higher, see my reply to Alan Cox.

> >  Why exactly do you think only 1st generation children matter? Look again
> > at the process tree posted by me and you'll see it solves nothing there.
> > I still fail to see why counting also all other generations should be
> > considered anything more than a negligible penalty for something that's
> > not a bottleneck at all.
>
> You're specifying a problem that is outside the scope of the oom killer,
> sorry.

 But it could be inside of the scope, since it causes OOM, and I don't think 
it's an unrealistic or rare use case. I don't consider it less likely than 
spawning a large number of direct children. If you want to cover only 
certified reasons for causing OOM, it can be as well said that userspace is 
not allowed to cause OOM at all.

-- 
 Lubos Lunak
 openSUSE Boosters team, KDE developer
 l.lunak@...e.cz , l.lunak@....org
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/