linux-kernel - [PATCH 1/1] mm: numa: Quickly fail allocations for NUMA balancing on full nodes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-Id: <1456234791-31502-1-git-send-email-mgorman@techsingularity.net>
Date:	Tue, 23 Feb 2016 13:39:51 +0000
From:	Mel Gorman <mgorman@...hsingularity.net>
To:	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Vlastimil Babka <vbabka@...e.cz>,
	Johannes Weiner <hannes@...xchg.org>,
	David Rientjes <rientjes@...gle.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Linux-MM <linux-mm@...ck.org>,
	Mel Gorman <mgorman@...hsingularity.net>
Subject: [PATCH 1/1] mm: numa: Quickly fail allocations for NUMA balancing on full nodes

Commit 4167e9b2cf10 ("mm: remove GFP_THISNODE") removed the
GFP_THISNODE flag combination due to confusing semantics. It noted that
alloc_misplaced_dst_page() was one such user after changes made by commit
e97ca8e5b864 ("mm: fix GFP_THISNODE callers and clarify"). Unfortunately
when GFP_THISNODE was removed, users of alloc_misplaced_dst_page() started
waking kswapd and entering direct reclaim because the wrong GFP flags are
cleared. The consequence is that workloads that used to fit into memory
now get reclaimed which is addressed by this patch.

The problem can be demonstrated with "mutilate" that exercises memcached
which is software dedicated to memory object caching. The configuration
uses 80% of memory and is run 3 times for varying numbers of clients. The
results on a 4-socket NUMA box are

mutilate
                            4.4.0                 4.4.0
                          vanilla           numaswap-v1
Hmean    1      8394.71 (  0.00%)     8395.32 (  0.01%)
Hmean    4     30024.62 (  0.00%)    34513.54 ( 14.95%)
Hmean    7     32821.08 (  0.00%)    70542.96 (114.93%)
Hmean    12    55229.67 (  0.00%)    93866.34 ( 69.96%)
Hmean    21    39438.96 (  0.00%)    85749.21 (117.42%)
Hmean    30    37796.10 (  0.00%)    50231.49 ( 32.90%)
Hmean    47    18070.91 (  0.00%)    38530.13 (113.22%)

The metric is queries/second with the more the better. The results are way
outside of the noise and the reason for the improvement is obvious from
some of the vmstats

                                 4.4.0       4.4.0
                               vanillanumaswap-v1r1
Minor Faults                1929399272  2146148218
Major Faults                  19746529        3567
Swap Ins                      57307366        9913
Swap Outs                     50623229       17094
Allocation stalls                35909         443
DMA allocs                           0           0
DMA32 allocs                  72976349   170567396
Normal allocs               5306640898  5310651252
Movable allocs                       0           0
Direct pages scanned         404130893      799577
Kswapd pages scanned         160230174           0
Kswapd pages reclaimed        55928786           0
Direct pages reclaimed         1843936       41921
Page writes file                  2391           0
Page writes anon              50623229       17094

The vanilla kernel is swapping like crazy with large amounts of
direct reclaim and kswapd activity. The figures are aggregate but it's
known that the bad activity is throughout the entire test.

Note that simple streaming anon/file memory consumers also see this problem
but it's not as obvious. In those cases, kswapd is awake when it should
not be.

As there are at least two reclaim-related bugs out there, it's worth spelling
out the user-visible impact. This patch only addresses bugs related to
excessive reclaim on NUMA hardware when the working set is larger than a NUMA
node. There is a bug related to high kswapd CPU usage but the reports are
against laptops and other UMA hardware and is not addressed by this patch.

Signed-off-by: Mel Gorman <mgorman@...hsingularity.net>
Cc: stable@...r.kernel.org # v4.1+
---
 mm/migrate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index 7890d0bb5e23..6d17e0ab42d4 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1578,7 +1578,7 @@ static struct page *alloc_misplaced_dst_page(struct page *page,
 					 (GFP_HIGHUSER_MOVABLE |
 					  __GFP_THISNODE | __GFP_NOMEMALLOC |
 					  __GFP_NORETRY | __GFP_NOWARN) &
-					 ~(__GFP_IO | __GFP_FS), 0);
+					 ~__GFP_RECLAIM, 0);

 	return newpage;
 }
-- 
2.6.4