[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALvZod7iaOEeGmDJA0cZvJWpuzc-hMRn3PG2cfzcMniJtAjKqA@mail.gmail.com>
Date: Sun, 1 Oct 2017 16:29:48 -0700
From: Shakeel Butt <shakeelb@...gle.com>
To: Tim Hockin <thockin@...kin.org>
Cc: Roman Gushchin <guro@...com>, Michal Hocko <mhocko@...nel.org>,
Johannes Weiner <hannes@...xchg.org>,
Tejun Heo <tj@...nel.org>, kernel-team@...com,
David Rientjes <rientjes@...gle.com>,
Linux MM <linux-mm@...ck.org>,
Vladimir Davydov <vdavydov.dev@...il.com>,
Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
Andrew Morton <akpm@...ux-foundation.org>,
Cgroups <cgroups@...r.kernel.org>, linux-doc@...r.kernel.org,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [v8 0/4] cgroup-aware OOM killer
>
> Going back to Michal's example, say the user configured the following:
>
> root
> / \
> A D
> / \
> B C
>
> A global OOM event happens and we find this:
> - A > D
> - B, C, D are oomgroups
>
> What the user is telling us is that B, C, and D are compound memory
> consumers. They cannot be divided into their task parts from a memory
> point of view.
>
> However, the user doesn't say the same for A: the A subtree summarizes
> and controls aggregate consumption of B and C, but without groupoom
> set on A, the user says that A is in fact divisible into independent
> memory consumers B and C.
>
> If we don't have to kill all of A, but we'd have to kill all of D,
> does it make sense to compare the two?
>
I think Tim has given very clear explanation why comparing A & D makes
perfect sense. However I think the above example, a single user system
where a user has designed and created the whole hierarchy and then
attaches different jobs/applications to different nodes in this
hierarchy, is also a valid scenario. One solution I can think of, to
cater both scenarios, is to introduce a notion of 'bypass oom' or not
include a memcg for oom comparision and instead include its children
in the comparison.
So, in the same above example:
root
/ \
A(b) D
/ \
B C
A is marked as bypass and thus B and C are to be compared to D. So,
for the single user scenario, all the internal nodes are marked
'bypass oom comparison' and oom_priority of the leaves has to be set
to the same value.
Below is the pseudo code of select_victim_memcg() based on this idea
and David's previous pseudo code. The calculation of size of a memcg
is still not very well baked here yet. I am working on it and I plan
to have a patch based on Roman's v9 "mm, oom: cgroup-aware OOM killer"
patch.
struct mem_cgroup *memcg = root_mem_cgroup;
struct mem_cgroup *selected_memcg = root_mem_cgroup;
struct mem_cgroup *low_memcg;
unsigned long low_priority;
unsigned long prev_badness = memcg_oom_badness(memcg); // Roman's code
LIST_HEAD(queue);
next_level:
low_memcg = NULL;
low_priority = ULONG_MAX;
next:
for_each_child_of_memcg(it, memcg) {
unsigned long prio = it->oom_priority;
unsigned long badness = 0;
if (it->bypass_oom && !it->oom_group &&
memcg_has_children(it)) {
list_add(&it->oom_queue, &queue);
continue;
}
if (prio > low_priority)
continue;
if (prio == low_priority) {
badness = mem_cgroup_usage(it); // for
simplicity, need more thinking
if (badness < prev_badness)
continue;
}
low_memcg = it;
low_priority = prio;
prev_badness = badness ?: mem_cgroup_usage(it); //
for simplicity
}
if (!list_empty(&queue)) {
memcg = list_last_entry(&queue, struct mem_cgroup, oom_queue);
list_del(&memcg->oom_queue);
goto next;
}
if (low_memcg) {
selected_memcg = memcg = low_memcg;
prev_badness = 0;
if (!low_memcg->oom_group)
goto next_level;
}
if (selected_memcg->oom_group)
oom_kill_memcg(selected_memcg);
else
oom_kill_process_from_memcg(selected_memcg);
Powered by blists - more mailing lists