linux-kernel - Re: [patch -mm 8/9 v2] oom: avoid oom killer for lowmem allocations

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.00.1002161609200.11952@chino.kir.corp.google.com>
Date:	Tue, 16 Feb 2010 16:21:11 -0800 (PST)
From:	David Rientjes <rientjes@...gle.com>
To:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
cc:	Nick Piggin <npiggin@...e.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Rik van Riel <riel@...hat.com>,
	Andrea Arcangeli <aarcange@...hat.com>,
	Balbir Singh <balbir@...ux.vnet.ibm.com>,
	Lubos Lunak <l.lunak@...e.cz>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [patch -mm 8/9 v2] oom: avoid oom killer for lowmem
 allocations

On Wed, 17 Feb 2010, KAMEZAWA Hiroyuki wrote:

> > On Wed, 17 Feb 2010, KAMEZAWA Hiroyuki wrote:
> > 
> > > > > > I'll add this check to __alloc_pages_may_oom() for the !(gfp_mask & 
> > > > > > __GFP_NOFAIL) path since we're all content with endlessly looping.
> > > > > 
> > > > > Thanks. Yes endlessly looping is far preferable to randomly oopsing
> > > > > or corrupting memory.
> > > > > 
> > > > 
> > > > Here's the new patch for your consideration.
> > > > 
> > > 
> > > Then, can we take kdump in this endlessly looping situaton ?
> > > 
> > > panic_on_oom=always + kdump can do that. 
> > > 
> > 
> > The endless loop is only helpful if something is going to free memory 
> > external to the current page allocation: either another task with 
> > __GFP_WAIT | __GFP_FS that invokes the oom killer, a task that frees 
> > memory, or a task that exits.
> > 
> > The most notable endless loop in the page allocator is the one when a task 
> > has been oom killed, gets access to memory reserves, and then cannot find 
> > a page for a __GFP_NOFAIL allocation:
> > 
> > 	do {
> > 		page = get_page_from_freelist(gfp_mask, nodemask, order,
> > 			zonelist, high_zoneidx, ALLOC_NO_WATERMARKS,
> > 			preferred_zone, migratetype);
> > 
> > 		if (!page && gfp_mask & __GFP_NOFAIL)
> > 			congestion_wait(BLK_RW_ASYNC, HZ/50);
> > 	} while (!page && (gfp_mask & __GFP_NOFAIL));
> > 
> > We don't expect any such allocations to happen during the exit path, but 
> > we could probably find some in the fs layer.
> > 
> > I don't want to check sysctl_panic_on_oom in the page allocator because it 
> > would start panicking the machine unnecessarily for the integrity 
> > metadata GFP_NOIO | __GFP_NOFAIL allocation, for any 
> > order > PAGE_ALLOC_COSTLY_ORDER, or for users who can't lock the zonelist 
> > for oom kill that wouldn't have panicked before.
> > 
> 
> Then, why don't you check higzone_idx in oom_kill.c
> 

out_of_memory() doesn't return a value to specify whether the page 
allocator should retry the allocation or just return NULL, all that policy 
is kept in mm/page_alloc.c.  For highzone_idx < ZONE_NORMAL, we want to 
fail the allocation when !(gfp_mask & __GFP_NOFAIL) and call the oom 
killer when it's __GFP_NOFAIL.
---
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1696,6 +1696,9 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
 		/* The OOM killer will not help higher order allocs */
 		if (order > PAGE_ALLOC_COSTLY_ORDER)
 			goto out;
+		/* The OOM killer does not needlessly kill tasks for lowmem */
+		if (high_zoneidx < ZONE_NORMAL)
+			goto out;
 		/*
 		 * GFP_THISNODE contains __GFP_NORETRY and we never hit this.
 		 * Sanity check for bare calls of __GFP_THISNODE, not real OOM.
@@ -1924,15 +1927,23 @@ rebalance:
 			if (page)
 				goto got_pg;
 
-			/*
-			 * The OOM killer does not trigger for high-order
-			 * ~__GFP_NOFAIL allocations so if no progress is being
-			 * made, there are no other options and retrying is
-			 * unlikely to help.
-			 */
-			if (order > PAGE_ALLOC_COSTLY_ORDER &&
-						!(gfp_mask & __GFP_NOFAIL))
-				goto nopage;
+			if (!(gfp_mask & __GFP_NOFAIL)) {
+				/*
+				 * The oom killer is not called for high-order
+				 * allocations that may fail, so if no progress
+				 * is being made, there are no other options and
+				 * retrying is unlikely to help.
+				 */
+				if (order > PAGE_ALLOC_COSTLY_ORDER)
+					goto nopage;
+				/*
+				 * The oom killer is not called for lowmem
+				 * allocations to prevent needlessly killing
+				 * innocent tasks.
+				 */
+				if (high_zoneidx < ZONE_NORMAL)
+					goto nopage;
+			}
 
 			goto restart;
 		}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/