linux-kernel - Re: [PATCH] Revert oom rewrite series

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20101117004854.GA7153@google.com>
Date:	Tue, 16 Nov 2010 16:48:54 -0800
From:	Mandeep Singh Baines <msb@...omium.org>
To:	Bodo Eggert <7eggert@....de>
Cc:	David Rientjes <rientjes@...gle.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Ying Han <yinghan@...gle.com>, Bodo Eggert <7eggert@....de>,
	"Figo.zhang" <figo1802@...il.com>
Subject: Re: [PATCH] Revert oom rewrite series

Bodo Eggert (7eggert@....de) wrote:
> On Mon, 15 Nov 2010, David Rientjes wrote:
> > On Tue, 16 Nov 2010, Bodo Eggert wrote:
> 
> > > > CAP_SYS_RESOURCE threads have full control over their oom killing priority
> > > > by /proc/pid/oom_score_adj
> > > 
> > > , but unless they are written in the last months and designed for linux
> > > and if the author took some time to research each external process invocation,
> > > they can not be aware of this possibility.
> > > 
> > 
> > You're clearly wrong, CAP_SYS_RESOURCE has been required to modify oom_adj 
> > for over five years (as long as the git history).  8fb4fc68, merged into 
> > 2.6.20, allowed tasks to raise their own oom_adj but not decrease it.  
> > That is unchanged by the rewrite.
> 
> You are misunderstanding me. It was allowed to do this, but it did not need 
> to do it yet. It was enough to be a well-written POSIX application without 
> linux-specific OOM hacks for some specific kernel versions.
> 
> > > Besides that, if each process is supposed to change the default, the default
> > > is wrong.
> > 
> > That doesn't make any sense, if want to protect a thread from the oom 
> > killer you're going to need to modify oom_score_adj, the kernel can't know 
> > what you perceive as being vital.  Having CAP_SYS_RESOURCE alone does not 
> > imply that, it only allows unbounded access to resources.  That's 
> > completely orthogonal to the goal of the oom killer heuristic, which is to 
> > find the most memory-hogging task to kill.
> 
> The old oom killer's task was to guess the best victim to kill. For me, it 
> did a good job (but the system kept thrashing for too long until it kicked

Here's a patch I've been working on to control thrashing.

http://lkml.org/lkml/2010/10/28/289

It works well for our app: web browser. We'd rather OOM quickly and kill
a browser tab than thrash for a few minutes and then OOM. It works well for
us but I'm working on a more generally useful solution.

> the offender). Looking at CAP_SYS_RESOURCE was one way to recognize 
> important processes.
> 
> > > 1) The exponential scale did have a low resolution.
> > > 
> > > 2) The heuristics were developed using much brain power and much
> > >    trial-and-error. You are going back to basics, and some people
> > >    are not convinced that this is better. I googled and I did not
> > >    find a discussion about how and why the new score was designed
> > >    this way.
> > >    looking at the output of:
> > >    cd /proc; for a in [0-9]*; do
> > >      echo `cat $a/oom_score` $a `perl -pes/'\0.*$'// < $a/cmdline`;
> > >    done|grep -v ^0|sort -n |less
> > >    , I 'm not convinced, too.
> > > 
> > 
> > The old heuristics were a mixture of arbitrary values that didn't adjust 
> > scores based on a unit and would often cause the incorrect task to be 
> > targeted because there was no clear goal being achieved.  The new 
> > heuristic has a solid goal: to identify and kill the most memory-hogging 
> > task that is eligible given the context in which the oom occurs.  If you 
> > disagree with that goal and want any of the old heursitics reintroduced, 
> > please show that it makes sense in the oom killer.
> 
> The first old OOM killer did the same as you promise the current one does,
> except for your bugfixes. That's why it killed the wrong applications and
> all the heuristics were added until the complaints stopped.
> 
> Off cause I did not yet test your OOM killer, maybe it really is better.
> Heuristics tend to rot and you did much work to make it right.
> 
> I don't want the old OOM killer back, but I don't want you to fall
> into the same pits as the pre-old OOM killer used to do.
> 
> > > PS) Mapping an exponential value to a linear score is bad. E.g. A
> > >     oom_adj of 8 should make an 1-MB-process as likely to kill as
> > >     a 256-MB-process with oom_adj=0.
> > > 
> > 
> > To show that, you would have to show that an application that exists today 
> > uses an oom_adj for something other than polarization and is based on a 
> > calculation of allowable memory usage.  It simply doesn't exist.
> 
> No such application should exist because the OOM killer should DTRT.
> oom_adj was supposed to let the sysadmin lower his mission-critical
> DB's score to be just lower than the less-important tasks, or to
> point the kernel to his ever-faulty and easily-restarted browser.
> 
> > > PS2) Because I saw this in your presentation PDF: (@udev-people)
> > >     The -17 score of udevd is wrong, since it will even prevent
> > >     the OOM killer from working correctly if it grows to 100 MB:
> > > 
> > 
> > Threads with CAP_SYS_RESOURCE are free to lower the oom_score_adj of any 
> > thread they deem fit and that includes applications that lower its own 
> > oom_score_adj.  The kernel isn't going to prohibit users from setting 
> > their own oom_score_adj.
> 
> My point is: The udev people should not prevent the OOM killer 
> unconditionally, it has an important task in case something goes wrong.
> I just didn't want to start a new thread at that time of day.
> -- 
> How do I set my laser printer on stun?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/