linux-kernel - Re: [Bug #14141] order 2 page allocation failures in iwlagn

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 15 Oct 2009 00:56:36 +0100
From:	Mel Gorman <mel@....ul.ie>
To:	Frans Pop <elendil@...net.nl>
Cc:	David Rientjes <rientjes@...gle.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	"Rafael J. Wysocki" <rjw@...k.pl>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Kernel Testers List <kernel-testers@...r.kernel.org>,
	Pekka Enberg <penberg@...helsinki.fi>,
	Reinette Chatre <reinette.chatre@...el.com>,
	Bartlomiej Zolnierkiewicz <bzolnier@...il.com>,
	Karol Lewandowski <karol.k.lewandowski@...il.com>,
	Mohamed Abbas <mohamed.abbas@...el.com>,
	"John W. Linville" <linville@...driver.com>, linux-mm@...ck.org
Subject: Re: [Bug #14141] order 2 page allocation failures in iwlagn

On Wed, Oct 14, 2009 at 08:34:56PM +0200, Frans Pop wrote:
> Some initial results; all negative I'm afraid.
> 

These are highly unlikely candidates. I say highly unlikely because they
are before the page allocator patches when your analysis indicated
things were ok.

Commit 70ac23c readahead: sequential mmap readahead
	This affects readahead for mmap() and could have an impact on the
	number of allocations made by the streaming IO. This might be
	generating more bursty network traffic in 2.6.31 than 2.6.30 and
	affecting the allocation apttern enough to cause problems

Commit 2fad6f5 readahead: enforce full readahead size on async mmap readahead
	Another readahead change that may affect the rate of network
	traffic being generated when streaming IO over the network

Commit 10be0b3 readahead: introduce context readahead algorithm
	By using readahead in more situations, it again may be affecting
	the burst rate of network traffic and the rate of GFP_ATOMIC arrivals

Commit 78dc583 vmscan: low order lumpy reclaim also should use PAGEOUT_IO_SYNC
	Very low probability that this is a problem, but it affects
	lumpy reclaim and so has to be considered. It's an awkward
	revert but I think the most important part is just to revert the
	condition that checks if congestion_wait() should be called or not

I relooked at the page allocator patches themselves just in case. Of the
patches in there, I came up with

Commit 11e33f6 page allocator: break up the allocator entry point into fast and slow paths
	This is possibly the most disruptive patch in the set. It should
	not have affected behaviour but the complexity of the patch is
	quite high. I did spot an oddity whereby a process exiting making
	a __GFP_NOFAIL allocation can ignore watermarks. It's unlikely
	this is the problem but as the journal layer uses __GFP_NOFAIL,
	you never know - it might be pushing things down low enough for
	other watermark checks to fail. Patch is below. This is also the
	patch that cause kswapd to wake up less. I sent a patch for that
	problem but I still don't know if it reduced the number of
	failures for you or not.

Commit f2260e6 page allocator: update NR_FREE_PAGES only as necessary
	This patch affects the timing of when NR_FREE_PAGES is updated.
	The reclaim algorithm makes decisions based on this NR_FREE_PAGES
	value.	Crucially, the value can determine if the anon list is force
	scanned or not. The window during which this can make a difference
	should be extremely small but maybe it's enough to make a difference.

Outside the range of commits suspected of causing problems was the
following. It's extremely low probability

Commit 8aa7e84 Fix congestion_wait() sync/async vs read/write confusion
	This patch alters the call to congestion_wait() in the page
	allocator. Frankly, I don't get the change but it might worth
	checking if replacing BLK_RW_ASYNC with WRITE on top of 2.6.31
	makes any difference

After a lot more eyeballing, the best next candidate within mm is the
following patch. Should be tested on it's own and in combination with
the wakeup-kswapd patch sent before.

====

>From 4e8b5217f51a00caee527e4e8d8e46fe9f82b482 Mon Sep 17 00:00:00 2001
From: Mel Gorman <mel@....ul.ie>
Date: Thu, 15 Oct 2009 00:17:05 +0100
Subject: [PATCH] page allocator: Direct reclaim should always obey watermarks

ALLOC_NO_WATERMARKS should be cleared when trying to allocate from the
free-lists after a direct reclaim.

Signed-off-by: Mel Gorman <mel@....ul.ie>
--- 
 mm/page_alloc.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3694609..619933d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1920,7 +1920,8 @@ rebalance:
 	page = __alloc_pages_direct_reclaim(gfp_mask, order,
 					zonelist, high_zoneidx,
 					nodemask,
-					alloc_flags, preferred_zone,
+					alloc_flags & ~ALLOC_NO_WATERMARKS,
+					preferred_zone,
 					migratetype, &did_some_progress);
 	if (page)
 		goto got_pg;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/