[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.1011141322590.22262@chino.kir.corp.google.com>
Date: Sun, 14 Nov 2010 13:29:44 -0800 (PST)
From: David Rientjes <rientjes@...gle.com>
To: KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
cc: "Figo.zhang" <figo1802@...il.com>,
lkml <linux-kernel@...r.kernel.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
Andrew Morton <akpm@...l.org>,
Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH v2]mm/oom-kill: direct hardware access processes should
get bonus
On Sun, 14 Nov 2010, KOSAKI Motohiro wrote:
> > So the question that needs to be answered is: why do these threads deserve
> > to use 3% more memory (not >4%) than others without getting killed? If
> > there was some evidence that these threads have a certain quantity of
> > memory they require as a fundamental attribute of CAP_SYS_RAWIO, then I
> > have no objection, but that's going to be expressed in a memory quantity
> > not a percentage as you have here.
>
> 3% is choosed by you :-/
>
No, 3% was chosen in __vm_enough_memory() for LSMs as the comment in the
oom killer shows:
/*
* Root processes get 3% bonus, just like the __vm_enough_memory()
* implementation used by LSMs.
*/
and is described in Documentation/filesystems/proc.txt.
I think in cases of heuristics like this where we obviously want to give
some bonus to CAP_SYS_ADMIN that there is consistency with other bonuses
given elsewhere in the kernel.
> Old background is very simple and cleaner.
>
The old heuristic divided the arbitrary badness score by 4 with
CAP_SYS_RESOURCE. The new heuristic doesn't consider it.
How is that more clean?
> CAP_SYS_RESOURCE mean the process has a privilege of using more resource.
> then, oom-killer gave it additonal bonus.
>
As a side-effect of being given more resources to allocate, those
applications are relatively unbounded in terms of memory consumption to
other tasks. Thus, it's possible that these applications are using a
massive amount of memory (say, 75%) and now with the proposed change a
task using 25% of memory would be killed instead. This increases the
liklihood that the CAP_SYS_RESOURCE thread will have to be killed
eventually, anyway, and the goal is to kill as few tasks as possible to
free sufficient amount of memory.
Since threads having CAP_SYS_RESOURCE have full control over their
oom_score_adj, they can take the additional precautions to protect
themselves if necessary. It doesn't need to be a part of the heuristic to
bias these tasks which will lead to the undesired result described above
by default rather than intentionally from userspace.
> CAP_SYS_RAWIO mean the process has a direct hardware access privilege
> (eg X.org, RDB). and then, killing it might makes system crash.
>
Then you would want to explicitly filter these tasks from oom kill just as
OOM_SCORE_ADJ_MIN works rather than giving them a memory quantity bonus.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists