lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160721124144.GB21806@cmpxchg.org>
Date:	Thu, 21 Jul 2016 08:41:44 -0400
From:	Johannes Weiner <hannes@...xchg.org>
To:	Vladimir Davydov <vdavydov@...tuozzo.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Michal Hocko <mhocko@...nel.org>,
	Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
	David Rientjes <rientjes@...gle.com>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] mm: oom: deduplicate victim selection code for memcg
 and global oom

Hi Vladimir,

Sorry for getting to this only now.

On Mon, Jun 27, 2016 at 07:39:54PM +0300, Vladimir Davydov wrote:
> When selecting an oom victim, we use the same heuristic for both memory
> cgroup and global oom. The only difference is the scope of tasks to
> select the victim from. So we could just export an iterator over all
> memcg tasks and keep all oom related logic in oom_kill.c, but instead we
> duplicate pieces of it in memcontrol.c reusing some initially private
> functions of oom_kill.c in order to not duplicate all of it. That looks
> ugly and error prone, because any modification of select_bad_process
> should also be propagated to mem_cgroup_out_of_memory.
> 
> Let's rework this as follows: keep all oom heuristic related code
> private to oom_kill.c and make oom_kill.c use exported memcg functions
> when it's really necessary (like in case of iterating over memcg tasks).

This approach, with the control flow in the OOM code, makes a lot of
sense to me. I think it's particularly useful in preparation for
supporting cgroup-aware OOM killing, where not just individual tasks
but entire cgroups are evaluated and killed as opaque memory units.

I'm thinking about doing something like the following, which should be
able to work regardless on what cgroup level - root, intermediate, or
leaf node - the OOM killer is invoked, and this patch works toward it:

struct oom_victim {
        bool is_memcg;
        union {
                struct task_struct *task;
                struct mem_cgroup *memcg;
        } entity;
        unsigned long badness;
};

oom_evaluate_memcg(oc, memcg, victim)
{
        if (memcg == root) {
                for_each_memcg_process(p, memcg) {
                        badness = oom_badness(oc, memcg, p);
                        if (badness == some_special_value) {
                                ...
                        } else if (badness > victim->badness) {
				victim->is_memcg = false;
				victim->entity.task = p;
				victim->badness = badness;
			}
                }
        } else {
                badness = 0;
                for_each_memcg_process(p, memcg) {
                        b = oom_badness(oc, memcg, p);
                        if (b == some_special_value)
                                ...
                        else
                                badness += b;
                }
                if (badness > victim.badness)
                        victim->is_memcg = true;
			victim->entity.memcg = memcg;
			victim->badness = badness;
		}
        }
}

oom()
{
        struct oom_victim victim = {
                .badness = 0,
        };

        for_each_mem_cgroup_tree(memcg, oc->memcg)
                oom_evaluate_memcg(oc, memcg, &victim);

        if (!victim.badness && !is_sysrq_oom(oc)) {
                dump_header(oc, NULL);
                panic("Out of memory and no killable processes...\n");
        }

        if (victim.badness != -1) {
                oom_kill_victim(oc, &victim);
                schedule_timeout_killable(1);
        }

        return true;
}

But even without that, with the unification of two identical control
flows and the privatization of a good amount of oom killer internals,
the patch speaks for itself.
	
> Signed-off-by: Vladimir Davydov <vdavydov@...tuozzo.com>

Acked-by: Johannes Weiner <hannes@...xchg.org>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ