[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.0901130134090.25386@chino.kir.corp.google.com>
Date: Tue, 13 Jan 2009 01:54:02 -0800 (PST)
From: David Rientjes <rientjes@...gle.com>
To: Evgeniy Polyakov <zbr@...emap.net>
cc: Bill Davidsen <davidsen@....com>,
Alan Cox <alan@...rguk.ukuu.org.uk>,
linux-kernel@...r.kernel.org,
Andrew Morton <akpm@...ux-foundation.org>,
Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: Linux killed Kenny, bastard!
On Tue, 13 Jan 2009, Evgeniy Polyakov wrote:
> It is a theory, not a practice. OOM-killer most of time starts from ssh,
> database and lighttpd on the tested machines, when it could start in
> the reverse order and do not touch ssh at all. Better not from daemon
> itself, but its fastcgi spawned processes.
>
In the unconstrained system-wide oom case, it scans each task on the
system (which can take very long, ask SGI) and rates its badness scoring.
When a memory-hogging task is identified, which you have complete control
over in userspace by tuning /proc/pid/oom_adj, it attempts to kill a child
first if it will allow for memory freeing without killing the parent.
> I agree, that there are ways to tune the way oom-killer selects the
> victim, and likely after hours of games this subtly will work for the
> specified workload.
It doesn't involve "hours of games," it is a very simple heuristic that
you can easily tune to specify your preferences.
What you're looking for with your patch is simply a way to specify an oom
preference before the task has been forked, but that's simple to do with
the current logic since oom_adj scores are inherited and preference is
given to killing a child before parent.
> What I propose is the simplest way for the most
> commonly used case.
No, procfs is the correct interface for tuning oom kill preferences and
not by name parsing.
With oom_adj scores, you have the ability to specify oom kill preferences
within a cpuset or memory controller as well, whereas oom_victim_name is
global and very costly when not found in select_bad_process().
> It is a help for the admin and not the force to
> invent complex machinery which will be error-prone and hard to debug
> when eventually oom happens.
It's very simple to debug the oom killer's decisions, which is why I
introduced /proc/sys/vm/oom_dump_tasks.
It also requires two expensive scans of the entire tasklist (I introduced
/proc/sys/vm/oom_kill_allocating_task specifically to avoid _one_
expensive scan) when oom_victim_name isn't found.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists