lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 27 Jan 2009 02:53:00 -0800 (PST)
From:	David Rientjes <rientjes@...gle.com>
To:	Nikanth Karthikesan <knikanth@...e.de>
cc:	Evgeniy Polyakov <zbr@...emap.net>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Alan Cox <alan@...rguk.ukuu.org.uk>,
	linux-kernel@...r.kernel.org,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Chris Snook <csnook@...hat.com>,
	Arve Hjønnevåg <arve@...roid.com>,
	Paul Menage <menage@...gle.com>,
	containers@...ts.linux-foundation.org,
	Balbir Singh <balbir@...ux.vnet.ibm.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
Subject: Re: [RFC] [PATCH] Cgroup based OOM killer controller

On Tue, 27 Jan 2009, Nikanth Karthikesan wrote:

> > As previously stated, I think the heuristic to penalize tasks for not
> > having an intersection with the set of allowable nodes of the oom
> > triggering task could be made slightly more severe.  That's irrelevant to
> > your patch, though.
> >
> 
> But the heuristic makes it non-deterministic, unlike memcg case. And this 
> mandates special handling for cpuset constrained OOM conditions in this patch.
> 

Dividing a badness score by 8 if a task's set of allowable nodes do not 
insect with the oom triggering task's set does not make an otherwise 
deterministic algorithm non-deterministic.

I don't understand what you're arguing for here.  Are you suggesting that 
we should not prefer tasks that intersect the set of allowable nodes?  
That makes no sense if the goal is to allow for future memory freeing.

> > We also talked about a cgroup /dev/mem_notify device file that you can
> > poll() and learn of low memory situations so that appropriate action can
> > be taken even in lowmem situations as opposed to simply oom conditions.
> >
> 
> Userspace also needs to handle the cpuset constrained _almost-oom_'s 
> specially? I wonder how easily userspace can handle that.
> 

It handles it very well, cpusets are a client of cgroups and the 
/dev/mem_notify extension would be as well.  So to handle lowmem 
notifications for your cpuset, you would mount both cgroup subsystems at 
the same time and then poll() on the mem_notify file.  It would be 
responsible for the aggregate of tasks that the cgroup represents.

If lowmem notifications are implemented in the reclaim path, this is much 
easier for cpusets than for the memory controller, actually, since we 
already collect per-node ZVC information.

> > These types of policy decisions belong in userspace.
> 
> Yes, policy decisions will be made in user-space using this oom-controller. 
> This is just a framework/means to enforce policies. We do not make any 
> decisions inside the kernel.
> 
> But yes, the badness calculation by the oom killer implements some kind of 
> policy inside the kernel, but I guess it can stay, as this oom-controller lets 
> user policy over-ride kernel policy. ;-)
> 

The goal should not be to override the kernel's choice, because that 
decision depends heavily on the type of oom and the state of the machine 
at the time.  Appropriate changes to the oom killer's heuristics are 
always welcome; in my opinion, we should probably penalize tasks that do 
not intersect the triggering task's set of allowable nodes more than we 
currently do.

I think you'll find that your goals can be accomplished with a mem_notify 
cgroup and that it is a much more powerful interface so that your 
userspace policy can be better informed, especially if it is aware of 
lowmem situations where oom conditions are imminent.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ