linux-kernel - RE: [PATCH 3.2.0-rc1 3/3] Used Memory Meter pseudo-device module

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.00.1201111338320.21755@chino.kir.corp.google.com>
Date:	Wed, 11 Jan 2012 13:44:42 -0800 (PST)
From:	David Rientjes <rientjes@...gle.com>
To:	leonid.moiseichuk@...ia.com
cc:	gregkh@...e.de, linux-mm@...ck.org, linux-kernel@...r.kernel.org,
	cesarb@...arb.net, kamezawa.hiroyu@...fujitsu.com,
	emunson@...bm.net, penberg@...nel.org, aarcange@...hat.com,
	riel@...hat.com, mel@....ul.ie, dima@...roid.com,
	rebecca@...roid.com, san@...gle.com, akpm@...ux-foundation.org,
	vesa.jaaskelainen@...ia.com
Subject: RE: [PATCH 3.2.0-rc1 3/3] Used Memory Meter pseudo-device module

On Wed, 11 Jan 2012, leonid.moiseichuk@...ia.com wrote:

> > So if the page allocator can make no progress in freeing memory, we would
> > introduce a delay in out_of_memory() if it were configured via a sysctl from
> > userspace.  When this delay is started, applications waiting on this event can
> > be notified with eventfd(2) that the delay has started and they have
> > however many milliseconds to address the situation.  When they rewrite the
> > sysctl, the delay is cleared.  If they don't rewrite the sysctl and the delay
> > expires, the oom killer proceeds with killing.
> > 
> > What's missing for your use case with this proposal?
> 
> Timed delays in multi-process handling in case OOM looks for me fragile 
> construction due to delays are not predicable.

Not sure what you mean by predictable; the oom conditions themselves 
certainly aren't predictable, otherwise you wouldn't need notification at 
all.  The delays are predictable since you configure it to be a number of 
millisecs via a global sysctl.  Userspace can either handle the oom itself 
and rewrite that sysctl to reset the delay or write 0 to make the kernel 
immediately oom.  If the delay expires, then it is assumed that userspace 
is dead and the kernel will proceed to avoid livelock.

> Memcg supports [1] better approach to freeze whole group and kick 
> pointed user-space application to handle it. We planned
> to use it as:
> - enlarge cgroup
> - send SIGTERM to selected "bad" application e.g. based on oom_score
> - wait a bit
> - send SIGKILL to "bad" application
> - reduce group size
> 
> But finally default OOM killer starts to work fine.
> 

I think you're misunderstanding the proposal; in the case of a global oom 
(that means without memcg) then, by definition, all threads that are 
allocating memory would be frozen and incur the delay at the point they 
would currently call into the oom killer.  If your userspace is alive, 
i.e. the application responsible for managing oom killing, then it can 
wait on eventfd(2), wake up, and then send SIGTERM and SIGKILL to the 
appropriate threads based on priority.

So, again, why wouldn't this work for you?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/