[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090113224941.GA14730@mit.edu>
Date: Tue, 13 Jan 2009 17:49:41 -0500
From: Theodore Tso <tytso@....edu>
To: Evgeniy Polyakov <zbr@...emap.net>
Cc: David Rientjes <rientjes@...gle.com>,
Alan Cox <alan@...rguk.ukuu.org.uk>,
linux-kernel@...r.kernel.org,
Andrew Morton <akpm@...ux-foundation.org>,
Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: Linux killed Kenny, bastard!
On Wed, Jan 14, 2009 at 12:46:27AM +0300, Evgeniy Polyakov wrote:
> User does not work with the some magically calculated scores, he just
> starts the processes and knows only their names. User can specify pid,
> but in the case of short-living connections it is not possible. Changing
> parent oom score opens a huge possibility to kill it, while in case of
> some application server (or database) it should never be killed, and
> only some of its clients (which work for the users and not for the
> calculating backend for example) have to be killed.
The standard way this gets handled for resource limits is very simple:
1) parent forks the child process
2) in the child process we set up resource limits, adjust oom
3) exec the child's program.
As Alan has already pointed out to you:
(echo XXXX > /proc/self/oom_adj ; exec /usr/bin/program)
There are two problems; one is whether or not the OOM protection is
inherited or not, and how one sets OOM protection --- and I think you
will find a huge resistance to using names as a way of expressing
policy.
The second problem is that oom_adj scoring is a hueristic which is
hard for system administrators to understand --- and these are
separable problems. Don't try to conflate them, and try using the
fact that a random score echo'ed into /proc/pid/oom_adj is hard to
tune as a justification for using process executable names.
If you want to argue that using containers is too hard, and there out
to be a simpler tuning parameter where (for the sake of argument) all
processes are given a number from 0 to 10, where 5 is the default, and
higher numbers will be picked unconditionally over lower numbers, and
the existing OOM score is used to distinguish between two process with
the same OOM protection, that's fine.
How we set that OOM protection class, whether it is via setrlimit() or
echoing into a magic /proc/pid/oom_protection file, and whether it
inherits across fork and exec calls, are a separate question.
- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists