linux-kernel - Re: [PATCH] mm: Do not stall in synchronous compaction for THP allocations

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.00.1111151554190.3781@chino.kir.corp.google.com>
Date:	Tue, 15 Nov 2011 16:07:08 -0800 (PST)
From:	David Rientjes <rientjes@...gle.com>
To:	Mel Gorman <mgorman@...e.de>
cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Minchan Kim <minchan.kim@...il.com>, Jan Kara <jack@...e.cz>,
	Andy Isaacson <adi@...apodia.org>,
	Johannes Weiner <jweiner@...hat.com>,
	Andrea Arcangeli <aarcange@...hat.com>,
	linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH] mm: Do not stall in synchronous compaction for THP
 allocations

On Tue, 15 Nov 2011, Mel Gorman wrote:

> Adding sync here could obviously be implemented although it may
> require both always-sync and madvise-sync. Alternatively, something
> like an options file could be created to create a bitmap similar to
> what ftrace does. Whatever the mechanism, it exposes the fact that
> "sync compaction" is used. If that turns out to be not enough, then
> you may want to add other steps like aggressively reclaiming memory
> which also potentially may need to be controlled via the sysfs file
> and this is the slippery slope.
> 

So what's being proposed here in this patch is the fifth time this line 
has been changed and its always been switched between true and !(gfp_mask 
& __GFP_NO_KSWAPD).  Instead of changing it every few months, I'd suggest 
that we tie the semantics of the tunable directly to sync_compaction since 
we're primarily targeting thp hugepages with this change anyway for the 
"always" case.  Comments?

diff --git a/Documentation/vm/transhuge.txt b/Documentation/vm/transhuge.txt
--- a/Documentation/vm/transhuge.txt
+++ b/Documentation/vm/transhuge.txt
@@ -116,6 +116,13 @@ echo always >/sys/kernel/mm/transparent_hugepage/defrag
 echo madvise >/sys/kernel/mm/transparent_hugepage/defrag
 echo never >/sys/kernel/mm/transparent_hugepage/defrag
 
+If defrag is set to "always", then all hugepage allocations also attempt
+synchronous memory compaction which makes the allocation as aggressive
+as possible.  The overhead of attempting to allocate the hugepage is
+considered acceptable because of the longterm benefits of the hugepage
+itself at runtime.  If the VM should fallback to using regular pages
+instead, then you should use "madvise" or "never".
+
 khugepaged will be automatically started when
 transparent_hugepage/enabled is set to "always" or "madvise, and it'll
 be automatically shutdown if it's set to "never".

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2168,7 +2168,17 @@ rebalance:
 					sync_migration);
 	if (page)
 		goto got_pg;
-	sync_migration = true;
+
+	/*
+	 * Do not use synchronous migration for transparent hugepages unless
+	 * defragmentation is always attempted for such allocations since it
+	 * can stall in writeback, which is far worse than simply failing to
+	 * promote a page.  Otherwise, we really do want a hugepage and are as
+	 * aggressive as possible to allocate it.
+	 */
+	sync_migration = !(gfp_mask & __GFP_NO_KSWAPD) ||
+			(transparent_hugepage_flags &
+				(1 << TRANSPARENT_HUGEPAGE_DEFRAG_FLAG));
 
 	/* Try direct reclaim and then allocating */
 	page = __alloc_pages_direct_reclaim(gfp_mask, order,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/