linux-kernel - Re: [PATCH] oom_kill: use rss value instead of vm size for badness

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20091201131509.5C19.A69D9226@jp.fujitsu.com>
Date:	Tue,  1 Dec 2009 13:43:34 +0900 (JST)
From:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
To:	David Rientjes <rientjes@...gle.com>
Cc:	kosaki.motohiro@...fujitsu.com,
	Andrea Arcangeli <aarcange@...hat.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	Hugh Dickins <hugh.dickins@...cali.co.uk>,
	vedran.furac@...il.com
Subject: Re: [PATCH] oom_kill: use rss value instead of vm size for badness

> On Fri, 27 Nov 2009, Andrea Arcangeli wrote:
> 
> > Ok I can see the fact by being dynamic and less predictable worries
> > you. The "second to last" tasks especially are going to be less
> > predictable, but the memory hog would normally end up accounting for
> > most of the memory and this should increase the badness delta between
> > the offending tasks (or tasks) and the innocent stuff, so making it
> > more reliable. The innocent stuff should be more and more paged out
> > from ram. So I tend to think it'll be much less likely to kill an
> > innocent task this way (as demonstrated in practice by your
> > measurement too), but it's true there's no guarantee it'll always do
> > the right thing, because it's a heuristic anyway, but even total_vm
> > doesn't provide guarantee unless your workload is stationary and your
> > badness scores are fixed and no virtual memory is ever allocated by
> > any task in the system and no new task are spawned.
> > 
> 
> The purpose of /proc/pid/oom_adj is not always to polarize the heuristic 
> for the task it represents, it allows userspace to define when a task is 
> rogue.  Working with total_vm as a baseline, it is simple to use the 
> interface to tune the heuristic to prefer a certain task over another when 
> its memory consumption goes beyond what is expected.  With this interface, 
> I can easily define when an application should be oom killed because it is 
> using far more memory than expected.  I can also disable oom killing 
> completely for it, if necessary.  Unless you have a consistent baseline 
> for all tasks, the adjustment wouldn't contextually make any sense.  Using 
> rss does not allow users to statically define when a task is rogue and is 
> dependent on the current state of memory at the time of oom.
> 
> I would support removing most of the other heuristics other than the 
> baseline and the nodes intersection with mems_allowed to prefer tasks in 
> the same cpuset, though, to make it easier to understand and tune.

I feel you talked about oom_adj doesn't fit your use case. probably you need
/proc/{pid}/oom_priority new knob. oom adjustment doesn't fit you.
you need job severity based oom killing order. severity doesn't depend on any
hueristic.
server administrator should know job severity on his system.

OOM heuristic should mainly consider desktop usage. because desktop user
doesn't change oom knob at all. and they doesn't know what deamon is important.
any userful heuristics have some dynamically aspect. we can't avoid it.

thought?


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/