[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160721124144.GB21806@cmpxchg.org>
Date: Thu, 21 Jul 2016 08:41:44 -0400
From: Johannes Weiner <hannes@...xchg.org>
To: Vladimir Davydov <vdavydov@...tuozzo.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Michal Hocko <mhocko@...nel.org>,
Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
David Rientjes <rientjes@...gle.com>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] mm: oom: deduplicate victim selection code for memcg
and global oom
Hi Vladimir,
Sorry for getting to this only now.
On Mon, Jun 27, 2016 at 07:39:54PM +0300, Vladimir Davydov wrote:
> When selecting an oom victim, we use the same heuristic for both memory
> cgroup and global oom. The only difference is the scope of tasks to
> select the victim from. So we could just export an iterator over all
> memcg tasks and keep all oom related logic in oom_kill.c, but instead we
> duplicate pieces of it in memcontrol.c reusing some initially private
> functions of oom_kill.c in order to not duplicate all of it. That looks
> ugly and error prone, because any modification of select_bad_process
> should also be propagated to mem_cgroup_out_of_memory.
>
> Let's rework this as follows: keep all oom heuristic related code
> private to oom_kill.c and make oom_kill.c use exported memcg functions
> when it's really necessary (like in case of iterating over memcg tasks).
This approach, with the control flow in the OOM code, makes a lot of
sense to me. I think it's particularly useful in preparation for
supporting cgroup-aware OOM killing, where not just individual tasks
but entire cgroups are evaluated and killed as opaque memory units.
I'm thinking about doing something like the following, which should be
able to work regardless on what cgroup level - root, intermediate, or
leaf node - the OOM killer is invoked, and this patch works toward it:
struct oom_victim {
bool is_memcg;
union {
struct task_struct *task;
struct mem_cgroup *memcg;
} entity;
unsigned long badness;
};
oom_evaluate_memcg(oc, memcg, victim)
{
if (memcg == root) {
for_each_memcg_process(p, memcg) {
badness = oom_badness(oc, memcg, p);
if (badness == some_special_value) {
...
} else if (badness > victim->badness) {
victim->is_memcg = false;
victim->entity.task = p;
victim->badness = badness;
}
}
} else {
badness = 0;
for_each_memcg_process(p, memcg) {
b = oom_badness(oc, memcg, p);
if (b == some_special_value)
...
else
badness += b;
}
if (badness > victim.badness)
victim->is_memcg = true;
victim->entity.memcg = memcg;
victim->badness = badness;
}
}
}
oom()
{
struct oom_victim victim = {
.badness = 0,
};
for_each_mem_cgroup_tree(memcg, oc->memcg)
oom_evaluate_memcg(oc, memcg, &victim);
if (!victim.badness && !is_sysrq_oom(oc)) {
dump_header(oc, NULL);
panic("Out of memory and no killable processes...\n");
}
if (victim.badness != -1) {
oom_kill_victim(oc, &victim);
schedule_timeout_killable(1);
}
return true;
}
But even without that, with the unification of two identical control
flows and the privatization of a good amount of oom killer internals,
the patch speaks for itself.
> Signed-off-by: Vladimir Davydov <vdavydov@...tuozzo.com>
Acked-by: Johannes Weiner <hannes@...xchg.org>
Powered by blists - more mailing lists