lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.02.1312182157510.1247@chino.kir.corp.google.com>
Date:	Wed, 18 Dec 2013 22:09:12 -0800 (PST)
From:	David Rientjes <rientjes@...gle.com>
To:	Michal Hocko <mhocko@...e.cz>
cc:	Johannes Weiner <hannes@...xchg.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	cgroups@...r.kernel.org
Subject: Re: [patch 1/2] mm, memcg: avoid oom notification when current needs
 access to memory reserves

On Wed, 18 Dec 2013, Michal Hocko wrote:

> > For memory isolation, we'd only want to bypass memcg charges when 
> > absolutely necessary and it seems like TIF_MEMDIE is the only case where 
> > that's required.  We don't give processes with pending SIGKILLs or those 
> > in the exit() path access to memory reserves in the page allocator without 
> > first determining that reclaim can't make any progress for the same reason 
> > and then we only do so by setting TIF_MEMDIE when calling the oom killer.  
> 
> While I do understand arguments about isolation I would also like to be
> practical here. How many charges are we talking about? Dozen pages? Much
> more?

The PF_EXITING bypass is indeed much less concerning than the 
fatal_signal_pending() bypass.

> Besides that all of those should be very short lived because the task
> is going to die very soon and so the memory will be freed.
> 

We don't know how much memory is being allocated while 
fatal_signal_pending() is true before the process can handle the SIGKILL, 
so this could potentially bypass a significant amount of memory.  If we 
are to have a configuration such as what Tejun recommended for oom 
handling:

			 _____root______
			/		\
		    user		 oom
		   /    \		/   \
		  A	 B	       a     b

where the limit of A + B can be greater than the limit of user for 
overcommit, and the limit of user is the amount of RAM minus whatever is 
reserved for the oom hierarchy, then significant bypass to the root memcg 
will cause memcgs in the oom hierarchy to actually not be able to allocate 
memory from the page allocator.

The PF_EXITING bypass is much less concerning because we shouldn't be 
doing significant memory allocation in the exit() path, but it's also true 
that neither the PF_EXITING nor the fatal_signal_pending() bypass is 
required.  In Tejun's suggested configuration above, we absolutely do want 
to reclaim from the user hierarchy before declaring oom and setting 
TIF_MEMDIE, otherwise the oom hierarchy cannot allocate.

> So from my POV I would like to see these heuristics as simple as
> possible and placed at very few places. Doing a bypass before charge
> - or even after a failed charge before doing reclaim sounds like an easy
> enough heuristic without a big risk.

It's a very significant risk of depleting memory that is available for oom 
handling in the suggested configuration.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ