[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171002192814.sad75tqklp3nmr4m@dhcp22.suse.cz>
Date: Mon, 2 Oct 2017 21:28:14 +0200
From: Michal Hocko <mhocko@...nel.org>
To: Shakeel Butt <shakeelb@...gle.com>
Cc: Tim Hockin <thockin@...kin.org>, Roman Gushchin <guro@...com>,
Johannes Weiner <hannes@...xchg.org>,
Tejun Heo <tj@...nel.org>, kernel-team@...com,
David Rientjes <rientjes@...gle.com>,
Linux MM <linux-mm@...ck.org>,
Vladimir Davydov <vdavydov.dev@...il.com>,
Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
Andrew Morton <akpm@...ux-foundation.org>,
Cgroups <cgroups@...r.kernel.org>, linux-doc@...r.kernel.org,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [v8 0/4] cgroup-aware OOM killer
On Mon 02-10-17 12:00:43, Shakeel Butt wrote:
> > Yes and nobody is disputing that, really. I guess the main disconnect
> > here is that different people want to have more detailed control over
> > the victim selection while the patchset tries to handle the most
> > simplistic scenario when a no userspace control over the selection is
> > required. And I would claim that this will be a last majority of setups
> > and we should address it first.
>
> IMHO the disconnect/disagreement is which memcgs should be compared
> with each other for oom victim selection. Let's forget about oom
> priority and just take size into the account. Should the oom selection
> algorithm, compare the leaves of the hierarchy or should it compare
> siblings? For the single user system, comparing leaves makes sense
> while in a multi user system, siblings should be compared for victim
> selection.
THis is simply not true. This is not about single vs. multi user
systems. This is about how the memcg hierarchy is organized (please
have a look at the example I've provided previously). I would dare to
claim that comparing siblings is a weaker semantic just because it puts
stronger constrains on how the hierarchy is organized. Especially when
the cgrou v2 is single hierarchy based (so we cannot create intermediate
cgroup nodes for other controllers because we would automatically get
a cumulative memory consumption).
I am sorry to cut the rest of your proposal because it simply goes over
the scope of the proposed solution while the usecase you are mentioning
is still possible. If we want to compare intermediate nodes (which seems
to be the case) then we can always provide a knob to opt-in - be it your
oom_gang or others.
I am sorry but I would really appreciate to focus on making the step
1 done before diverging into details about potential improvements and a
better control over the selection. This whole thing is an opt-in so
there is a no risk of a regression.
--
Michal Hocko
SUSE Labs
Powered by blists - more mailing lists