[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170922210519.GH828415@devbig577.frc2.facebook.com>
Date: Fri, 22 Sep 2017 14:05:19 -0700
From: Tejun Heo <tj@...nel.org>
To: David Rientjes <rientjes@...gle.com>
Cc: Johannes Weiner <hannes@...xchg.org>, Roman Gushchin <guro@...com>,
linux-mm@...ck.org, Michal Hocko <mhocko@...nel.org>,
Vladimir Davydov <vdavydov.dev@...il.com>,
Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
Andrew Morton <akpm@...ux-foundation.org>, kernel-team@...com,
cgroups@...r.kernel.org, linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [v8 0/4] cgroup-aware OOM killer
Hello,
On Fri, Sep 22, 2017 at 01:39:55PM -0700, David Rientjes wrote:
> Current heuristic based on processes is coupled with per-process
> /proc/pid/oom_score_adj. The proposed
> heuristic has no ability to be influenced by userspace, and it needs one.
> The proposed heuristic based on memory cgroups coupled with Roman's
> per-memcg memory.oom_priority is appropriate and needed. It is not
So, this is where we disagree. I don't think it's a good design.
> "sophisticated intelligence," it merely allows userspace to protect vital
> memory cgroups when opting into the new features (cgroups compared based
> on size and memory.oom_group) that we very much want.
which can't achieve that goal very well for wide variety of users.
> > We even change the whole scheduling behaviors and try really hard to
> > not get locked into specific implementation details which exclude
> > future improvements. Guaranteeing OOM killing selection would be
> > crazy. Why would we prevent ourselves from doing things better in the
> > future? We aren't talking about the semantics of read(2) here. This
> > is a kernel emergency mechanism to avoid deadlock at the last moment.
>
> We merely want to prefer other memory cgroups are oom killed on system oom
> conditions before important ones, regardless if the important one is using
> more memory than the others because of the new heuristic this patchset
> introduces. This is exactly the same as /proc/pid/oom_score_adj for the
> current heuristic.
You were arguing that we should lock into a specific heuristics and
guarantee the same behavior. We shouldn't.
When we introduce a user visible interface, we're making a lot of
promises. My point is that we need to be really careful when making
those promises.
> If you have this low priority maintenance job charging memory to the high
> priority hierarchy, you're already misconfigured unless you adjust
> /proc/pid/oom_score_adj because it will oom kill any larger process than
> itself in today's kernels anyway.
>
> A better configuration would be attach this hypothetical low priority
> maintenance job to its own sibling cgroup with its own memory limit to
> avoid exactly that problem: it going berserk and charging too much memory
> to the high priority container that results in one of its processes
> getting oom killed.
And how do you guarantee that across delegation boundaries? The
points you raise on why the priority should be applied level-by-level
are exactly the same points why this doesn't really work. OOM killing
priority isn't something which can be distributed across cgroup
hierarchy level-by-level. The resulting decision tree doesn't make
any sense.
I'm not against adding something which works but strict level-by-level
comparison isn't the solution.
Thanks.
--
tejun
Powered by blists - more mailing lists