linux-kernel - Re: [RFC][PATCH] memcg: page fault oom improvement v2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.00.1002231738070.3435@chino.kir.corp.google.com>
Date:	Tue, 23 Feb 2010 17:42:33 -0800 (PST)
From:	David Rientjes <rientjes@...gle.com>
To:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
cc:	Daisuke Nishimura <nishimura@....nes.nec.co.jp>,
	Balbir Singh <balbir@...ux.vnet.ibm.com>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC][PATCH] memcg: page fault oom improvement v2

On Wed, 24 Feb 2010, KAMEZAWA Hiroyuki wrote:

> > I think it would be better to just remove mem_cgroup_out_of_memory() and 
> > make it go through out_of_memory() by specifying a non-NULL pointer to a 
> > struct mem_cgroup.  We don't need the duplication in code that these two 
> > functions have and then we can begin to have some consistency with how to 
> > deal with panic_on_oom.
> > 
> > It would be much better to prefer killing current in pagefault oom 
> > conditions, as the final patch in my oom killer rewrite does, if it is 
> > killable.  If not, we scan the tasklist and find another suitable 
> > candidate.  If current is bound to a memcg, we pass that to 
> > select_bad_process() so that we only kill other tasks from the same 
> > cgroup.
> Adding new argument to out_of_memory ?
> 

Right, the pointer to pass into select_bad_process() to filter by memcg.

> > 
> > This allows us to hijack the TIF_MEMDIE bit to detect when there is a 
> > parallel pagefault oom killing when the oom killer hasn't necessarily been 
> > invoked to kill a system-wide task (it's simply killing current, by 
> > default, and giving it access to memory reserves).  Then, we can change 
> > out_of_memory(), which also now handles memcg oom conditions, to always 
> > scan the tasklist first (including for mempolicy and cpuset constrained 
> > ooms), check for any candidates that have TIF_MEMDIE, and return 
> > ERR_PTR(-1UL) if so.  That catches the parallel pagefault oom conditions 
> > from needlessly killing memcg tasks.  panic_on_oom would only panic after 
> > the tasklist scan has completed and returned != ERR_PTR(-1UL), meaning 
> > pagefault ooms are exempt from that sysctl.
> > 
> Sorry, I see your concern but I'd like not to do clean-up and bug-fix at
> the same time.  
> 
> I think clean up after fix is easy in this case.
> 

If you develop on top of my oom killer rewrite, pagefault ooms already 
attempt to kill current first and then defer back to killing another task 
if current is unkillable.  That means that panic_on_oom must be redefined: 
we _must_ now scan the entire tasklist looking for eligible tasks with the 
TIF_MEMDIE bit set before panicking in _all_ oom conditions.  Otherwise, 
it is possible to needlessly panic when the result of a pagefault oom 
(killing current) would lead to future memory freeing.  The previous 
VM_FAULT_OOM behavior before we used the oom killer was to kill current, 
there was no consideration given to panic_on_oom for those cases.  So 
pagefault_out_of_memory() must now try to kill current first and then 
leave panic_on_oom to be dealt with in out_of_memory() if the tasklist 
scan doesn't show any pagefault oom victims.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/