linux-kernel - Re: [PATCH] Revert oom rewrite series

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LNX.2.00.1011152255580.17235@be10.lrz>
Date:	Tue, 16 Nov 2010 00:33:43 +0100 (CET)
From:	Bodo Eggert <7eggert@....de>
To:	David Rientjes <rientjes@...gle.com>
cc:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Ying Han <yinghan@...gle.com>, Bodo Eggert <7eggert@....de>,
	Mandeep Singh Baines <msb@...gle.com>,
	"Figo.zhang" <figo1802@...il.com>
Subject: Re: [PATCH] Revert oom rewrite series

On Sun, 14 Nov 2010, David Rientjes wrote:

> Also, stating that the new heuristic doesn't address CAP_SYS_RESOURCE
> approrpiately isn't a bug report, it's the desired behavior.  I eliminated
> all of the arbitrary heursitics in the old heuristic that we had the
> remove internally as well so that is predictable as possible and achieves
> the oom killer's sole goal: to kill the most memory-hogging task that is
> eligible to allow memory allocations in the current context to succeed.

> CAP_SYS_RESOURCE threads have full control over their oom killing priority
> by /proc/pid/oom_score_adj

, but unless they are written in the last months and designed for linux
and if the author took some time to research each external process 
invocation, they can not be aware of this possibility.

Besides that, if each process is supposed to change the default, the 
default is wrong.

> and need no consideration in the heuristic by
> default since it otherwise allows for the probability that multiple tasks
> will need to be killed when a CAP_SYS_RESOURCE thread uses an egregious
> amount of memory.

If it happens to use an egregious mount of memory, it SHOULD score
enough to get killed.

>> The problem is, DavidR patches don't refrect real world usecase at all
>> and breaking them. He can talk about the userland is wrong. but such
>> excuse doesn't solve real world issue. it makes no sense.
>
> As mentioned just a few minutes ago in another thread, there is no
> userspace breakage with the rewrite and you're only complaining here about
> the deprecation of /proc/pid/oom_adj for a period of two years.  Until
> it's removed in 2012 or later, it maps to the linear scale that
> oom_score_adj uses rather than its old exponential scale that was
> unusable for prioritization because of (1) the extremely low resolution,
> and (2) the arbitrary heuristics that preceeded it.

1) The exponential scale did have a low resolution.

2) The heuristics were developed using much brain power and much
    trial-and-error. You are going back to basics, and some people
    are not convinced that this is better. I googled and I did not
    find a discussion about how and why the new score was designed
    this way.
    looking at the output of:
    cd /proc; for a in [0-9]*; do
      echo `cat $a/oom_score` $a `perl -pes/'\0.*$'// < $a/cmdline`;
    done|grep -v ^0|sort -n |less
    , I 'm not convinced, too.

PS) Mapping an exponential value to a linear score is bad. E.g. A
     oom_adj of 8 should make an 1-MB-process as likely to kill as
     a 256-MB-process with oom_adj=0.

PS2) Because I saw this in your presentation PDF: (@udev-people)
     The -17 score of udevd is wrong, since it will even prevent
     the OOM killer from working correctly if it grows to 100 MB:

     It's default OOM score is 13, while root's shell is at 190
     and some KDE processes are at 200 000. It will not get killed
     under normal circumstances.

     If it udevd grows enough to score 190 as well, it has a bug
     that causes it to eat memory and it needs to be killed. Having
     a -17 oom_adj, it will cause the system to fail instead.
     Considering udevd's size, an adj of -1 or -2 should be enough on
     embedded systems, while desktop systems should not need it.
     If you are worried about udevd getting killed, protect ist using
     a wrapper.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/