lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090630074717.GA11980@wotan.suse.de>
Date:	Tue, 30 Jun 2009 09:47:17 +0200
From:	Nick Piggin <npiggin@...e.de>
To:	David Rientjes <rientjes@...gle.com>
Cc:	Mel Gorman <mel@....ul.ie>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	penberg@...helsinki.fi, arjan@...radead.org,
	linux-kernel@...r.kernel.org, cl@...ux-foundation.org
Subject: Re: upcoming kerneloops.org item: get_page_from_freelist

On Mon, Jun 29, 2009 at 04:35:26PM -0700, David Rientjes wrote:
> On Mon, 29 Jun 2009, Mel Gorman wrote:
> 
> > page-allocator: Ensure that processes that have been OOM killed exit the page allocator
> > 
> > Processes that have been OOM killed set the thread flag TIF_MEMDIE. A
> > process such as this is expected to exit the page allocator but in the
> > event it happens to have set __GFP_NOFAIL, it potentially loops forever.
> > 
> 
> That's not the expected behavior for TIF_MEMDIE, although your patch 
> certainly changes that.
> 
> Your patch is simply doing
> 
> 	if (test_thread_flag(TIF_MEMDIE))
> 		gfp_mask |= __GFP_NORETRY;
> 
> in the slowpath.
> 
> TIF_MEMDIE is supposed to allow allocations to succeed, not automatically 
> fail, so that it can quickly handle its SIGKILL without getting blocked in 
> the exit path seeking more memory.

Yes, it need to just ignore all watermarks, do not reclaim (we've
already decided reclaim will not work at this point), and return a
page if we have one otherwise NULL (unless GFP_NOFAIL is set).

 
> > This patch checks TIF_MEMDIE when deciding whether to loop again in the
> > page allocator. Such a process will now return NULL after direct reclaim
> > and OOM killing have both been considered as options. The potential
> > problem is that a __GFP_NOFAIL allocation can still return failure so
> > callers must still handle getting returned NULL.
> > 
> 
> All __GFP_NOFAIL allocations should ensure that alloc_pages() never 
> returns NULL.  Although it's unfortunate, that's the requirement that 
> callers have been guaranteed and until they are fixed, the page allocator 
> should respect it.

Yes.

Interesting thing is what to do when we have 0 pages left, we are
TIF_MEMDIE, and GFP_NOFAIL is set. Looping will most likely just
deadlock the system. Returning NULL will probably oops caller with
various locks held and then deadlock the system. It really needs to
punt back to the OOM killer so it can select another task. Until
then, maybe a simple panic would be reasonable? (it's *never* going
to hit anyone in practice I'd say, but if it does then a panic
would be better than lockup at least we know what the problem was).

 
> I disagree with this change because it unconditionally fails allocations 
> when a task has been oom killed, a scenario which should be the _highest_ 
> priority for allocations to succeed since it leads to future memory 

That's another interesting point. I do agree with you because that
would restore previous behaviour which got broken. But I wonder if
possibly it would be a better idea to fail all allocations? That
would a) protect reserves more, and b) probably quite likely to
exit the syscall *sooner* than if we try to satisfy all allocations.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ