linux-kernel - Re: [RFC] [PATCH] Cgroup based OOM killer controller

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090127215118.GA12431@ioremap.net>
Date:	Wed, 28 Jan 2009 00:51:18 +0300
From:	Evgeniy Polyakov <zbr@...emap.net>
To:	David Rientjes <rientjes@...gle.com>
Cc:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Alan Cox <alan@...rguk.ukuu.org.uk>, balbir@...ux.vnet.ibm.com,
	Nikanth Karthikesan <knikanth@...e.de>,
	containers@...ts.linux-foundation.org,
	linux-kernel@...r.kernel.org,
	Torvalds <torvalds@...ux-foundation.org>,
	Arve Hj?nnev?g <arve@...roid.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Chris Snook <csnook@...hat.com>,
	Paul Menage <menage@...gle.com>
Subject: Re: [RFC] [PATCH] Cgroup based OOM killer controller

On Tue, Jan 27, 2009 at 12:37:21PM -0800, David Rientjes (rientjes@...gle.com) wrote:
> > Well, oom-killer can, since it drops unkillable state from the process
> > mask, that may be not enough though, but it tries more than userspace.
> > 
> 
> The only thing it does is send a SIGKILL and gives the thread access to 
> memory reserves with TIF_MEMDIE, it doesn't drop any unkillable state.  If 

There is a small difference between force_sig_info() and usual
send_sinal() used by kill.

> its victim is hung in D state and the memory reserves do not allow it to 
> return to being runnable, this task will not die and the oom killer would 
> livelock unless given another target.

D-states are different. In the current tree we even have
page_lock_killable(), so it depends.

> > My main point was to haev a way to monitor memory usage and that any
> > process could tune own behaviour according to that information. Which is
> > not realated to the system oom-killer at all. Thus /dev/mem_notify is
> > interested first (and only the first) as a memory usage notification
> > interface and not a way to invoke any kind of 'soft' oom-killer.
> 
> It's a way to prevent invoking the kernel oom killer by allowing userspace 
> notification of events where methods such as droping caches, elevating 
> limits, adding nodes, sending signals, etc, can prevent such a problem.  
> When the system (or cgroup) is completely oom, it can also issue SIGKILLs 
> that will free some memory and preempt the oom killer from acting.
> 
> I think there might be some confusion about my proposal for extending 
> /dev/mem_notify.  Not only should it notify of certain low memory events, 
> but it should also allow userspace notification of oom events, just like 
> the cgroup oom notifier patch allowed.  Instead of attaching a task to a 
> cgroup file in that case, however, this would simply be the responsibility 
> of a task that has set up a poll() on the cgroup's mem_notify file.  A 
> configurable delay could be imposed so page allocation attempts simply 
> loop while the userspace handler responds and then only invoke the oom 
> killer when absolutely necessary.

I have really no objections against this and extending oom-killer to
allow to wait a bit in the allocation path before userspace makes some
progress. But do not drop existing oom-killer (i.e. its ability to kill
processes) in favour of this new feature. Let's have both and if
extension failed for some reason, old oom-killer will do the things.

-- 
	Evgeniy Polyakov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/