linux-kernel - Re: + x86-mm-handle-mm_fault_error-in-kernel-space.patch added to -mm tree

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <20110313182137.4119.A69D9226@jp.fujitsu.com>
Date:	Sun, 13 Mar 2011 19:29:17 +0900 (JST)
From:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
To:	Oleg Nesterov <oleg@...hat.com>
Cc:	kosaki.motohiro@...fujitsu.com, Andrew Vagin <avagin@...il.com>,
	Pavel Emelyanov <xemul@...nvz.org>,
	Andrey Vagin <avagin@...nvz.org>, Ingo Molnar <mingo@...e.hu>,
	Thomas Gleixner <tglx@...utronix.de>,
	"H. Peter Anvin" <hpa@...or.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	David Rientjes <rientjes@...gle.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	linux-kernel@...r.kernel.org, Nick Piggin <npiggin@...e.de>
Subject: Re: + x86-mm-handle-mm_fault_error-in-kernel-space.patch added to -mm tree

> On 03/11, Oleg Nesterov wrote:
> >
> > On 03/11, Andrew Vagin wrote:
> > >
> > >
> > The point is, if current was _NOT_ killed we should follow the current
> > pagefault_out_of_memory() logic or remove pagefault_out_of_memory()
> > completely.
> 
> Yes, and I still think this is valid. And thus I still think the patch
> should be changed (btw, this problem is not x86 specific).
> 
> However,
> 
> > >> Why do you think the current task should be killed? In this case we
> > >> do not need oom-killer at all, we could always kill the caller of
> > >> alloc_page/etc.
> > >
> > > You don't understand. alloc_page calls oom-killer himself, then try
> > > allocate memory again. Pls look at __alloc_pages_slowpath().
> > > __alloc_pages_slowpat may fail if order > 3 || gfp_mask & __GFP_NOFAIL
> > > || test_thread_flag(TIF_MEMDIE)
> >
> > Andrew, please, I know this.
> 
> Hmm. It turns out I do not ;)
> 
> I thought I can find the case when handle_mm_fault() returns VM_FAULT_OOM
> and the caller is not killed, but I can't. I do not really understand
> mem_cgroup_handle_oom/etc, but it seems we always retry indefinitely even
> with mem_cgroup's. mm/hugetlb.c looks fine too...
> 
> So, I have to apologize, I am starting to think you are right.
> 
> Maybe someone could explain why pagefault_out_of_memory() is still
> needed?

Hi Oleg, Andrew,

Now you are seeing VM dark side. ;-)
Two independent commit were introduced this hard to understand code.

	commit 1c0fe6e3bda0464728c23c8d84aa47567e8b716c
	Author: Nick Piggin <npiggin@...e.de>
	Date:   Tue Jan 6 14:38:59 2009 -0800

	    mm: invoke oom-killer from page fault

	commit 6583bb64fc370842b32a87c67750c26f6d559af0
	Author: David Rientjes <rientjes@...gle.com>
	Date:   Wed Jul 29 15:02:06 2009 -0700

	    mm: avoid endless looping for oom killed tasks

Most typical case is, as andew described, handle_mm_fault -> pte_alloc_one
-> alloc_pages_current(GFP_KERNEL, 0). and order 0 GFP_KERNEL allocation
never fail except the task received TIF_MEMDIE. therefore, in this case,
no need additional pageout_out_of_memory() call. Anyway pageout_out_of_memory()
is no-op if the task has already TIF_MEMDIE.

But, we don't have any gurantee pagefault path have no large allocation
nor no GFP_ATOMIC allocation. Therefore I think Oleg's patch pointed out
right thing. The protocol is, vma->vm_ops->fault() can return VM_FAULT_OOM
and if it is, page fault handler should invoke out-of-memory.

But I doubt practical workload can observe the difference.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/