lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170815121558.GA15892@castle.dhcp.TheFacebook.com>
Date:   Tue, 15 Aug 2017 13:15:58 +0100
From:   Roman Gushchin <guro@...com>
To:     David Rientjes <rientjes@...gle.com>
CC:     <linux-mm@...ck.org>, Michal Hocko <mhocko@...nel.org>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
        Tejun Heo <tj@...nel.org>, <kernel-team@...com>,
        <cgroups@...r.kernel.org>, <linux-doc@...r.kernel.org>,
        <linux-kernel@...r.kernel.org>
Subject: Re: [v5 2/4] mm, oom: cgroup-aware OOM killer

On Mon, Aug 14, 2017 at 03:42:54PM -0700, David Rientjes wrote:
> On Mon, 14 Aug 2017, Roman Gushchin wrote:
> > +
> > +static long oom_evaluate_memcg(struct mem_cgroup *memcg,
> > +			       const nodemask_t *nodemask)
> > +{
> > +	struct css_task_iter it;
> > +	struct task_struct *task;
> > +	int elegible = 0;
> > +
> > +	css_task_iter_start(&memcg->css, 0, &it);
> > +	while ((task = css_task_iter_next(&it))) {
> > +		/*
> > +		 * If there are no tasks, or all tasks have oom_score_adj set
> > +		 * to OOM_SCORE_ADJ_MIN and oom_kill_all_tasks is not set,
> > +		 * don't select this memory cgroup.
> > +		 */
> > +		if (!elegible &&
> > +		    (memcg->oom_kill_all_tasks ||
> > +		     task->signal->oom_score_adj != OOM_SCORE_ADJ_MIN))
> > +			elegible = 1;
> 
> I'm curious about the decision made in this conditional and how 
> oom_kill_memcg_member() ignores task->signal->oom_score_adj.  It means 
> that memory.oom_kill_all_tasks overrides /proc/pid/oom_score_adj if it 
> should otherwise be disabled.
> 
> It's undocumented in the changelog, but I'm questioning whether it's the 
> right decision.  Doesn't it make sense to kill all tasks that are not oom 
> disabled, and allow the user to still protect certain processes by their 
> /proc/pid/oom_score_adj setting?  Otherwise, there's no way to do that 
> protection without a sibling memcg and its own reservation of memory.  I'm 
> thinking about a process that governs jobs inside the memcg and if there 
> is an oom kill, it wants to do logging and any cleanup necessary before 
> exiting itself.  It seems like a powerful combination if coupled with oom 
> notification.

Good question!
I think, that an ability to override any oom_score_adj value and get all tasks
killed is more important, than an ability to kill all processes with some
exceptions.

In your example someone still needs to look after the remaining process,
and kill it after some timeout, if it will not quit by itself, right?

The special treatment of the -1000 value (without oom_kill_all_tasks)
is required only to not to break the existing setups.

Generally, oom_score_adj should have a meaning only on a cgroup level,
so extending it to the system level doesn't sound as a good idea.

> 
> Also, s/elegible/eligible/

Shame on me :)
Will fix, thanks!

> 
> Otherwise, looks good!

Great!
Thank you for the reviewing and testing.

Roman

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ