linux-kernel - [PATCH 10/10] mm, page_alloc: Only enforce watermarks for order-0 allocations

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1437379219-9160-11-git-send-email-mgorman@suse.com>
Date:	Mon, 20 Jul 2015 09:00:19 +0100
From:	Mel Gorman <mgorman@...e.com>
To:	Linux-MM <linux-mm@...ck.org>
Cc:	Johannes Weiner <hannes@...xchg.org>,
	Rik van Riel <riel@...hat.com>,
	Vlastimil Babka <vbabka@...e.cz>,
	Pintu Kumar <pintu.k@...sung.com>,
	Xishi Qiu <qiuxishi@...wei.com>, Gioh Kim <gioh.kim@....com>,
	LKML <linux-kernel@...r.kernel.org>,
	Mel Gorman <mgorman@...hsingularity.net>
Subject: [PATCH 10/10] mm, page_alloc: Only enforce watermarks for order-0 allocations

From: Mel Gorman <mgorman@...e.de>

The primary purpose of watermarks is to ensure that reclaim can always
make forward progress in PF_MEMALLOC context (kswapd and direct reclaim).
These assume that order-0 allocations are all that is necessary for
forward progress.

High-order watermarks serve a different purpose. Kswapd had no high-order
awareness before they were introduced (https://lkml.org/lkml/2004/9/5/9).
This was particularly important when there were high-order atomic requests.
The watermarks both gave kswapd awareness and made a reserve for those
atomic requests.

There are two important side-effects of this. The most important is that
a non-atomic high-order request can fail even though free pages are available
and the order-0 watermarks are ok. The second is that high-order watermark
checks are expensive as the free list counts up to the requested order must
be examined.

With the introduction of MIGRATE_HIGHATOMIC it is no longer necessary to
have high-order watermarks. Kswapd and compaction still need high-order
awareness which is handled by checking that at least one suitable high-order
page is free.

In kernel 4.2-rc1 running this workload on a single-node machine there
were 339574 allocation failures. With HighAtomic reserves, it drops to
28798 failures. With this patch applied, it drops to 9567 failures --
a 98% reduction compared to the vanilla kernel or 67% in comparison to
having high atomic reserves with watermark checking.

The one potential side-effect of this is that in a vanilla kernel, the
watermark checks may have kept a free page for an atomic allocation. Now,
we are 100% relying on the HighAtomic reserves and an early allocation to
have allocated them.  If the first high-order atomic allocation is after
the system is already heavily fragmented then it'll fail.

Signed-off-by: Mel Gorman <mgorman@...e.de>
---
 mm/page_alloc.c | 38 ++++++++++++++++++++++++--------------
 1 file changed, 24 insertions(+), 14 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e5755390a5e5..e756df60dba6 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2250,8 +2250,10 @@ static inline bool should_fail_alloc_page(gfp_t gfp_mask, unsigned int order)
 #endif /* CONFIG_FAIL_PAGE_ALLOC */
 
 /*
- * Return true if free pages are above 'mark'. This takes into account the order
- * of the allocation.
+ * Return true if free base pages are above 'mark'. For high-order checks it
+ * will return true of the order-0 watermark is reached and there is at least
+ * one free page of a suitable size. Checking now avoids taking the zone lock
+ * to check in the allocation paths if no pages are free.
  */
 static bool __zone_watermark_ok(struct zone *z, unsigned int order,
 			unsigned long mark, int classzone_idx, int alloc_flags,
@@ -2259,7 +2261,7 @@ static bool __zone_watermark_ok(struct zone *z, unsigned int order,
 {
 	long min = mark;
 	int o;
-	long free_cma = 0;
+	const bool atomic = (alloc_flags & ALLOC_HARDER);
 
 	/* free_pages may go negative - that's OK */
 	free_pages -= (1 << order) - 1;
@@ -2271,7 +2273,7 @@ static bool __zone_watermark_ok(struct zone *z, unsigned int order,
 	 * If the caller is not atomic then discount the reserves. This will
 	 * over-estimate how the atomic reserve but it avoids a search
 	 */
-	if (likely(!(alloc_flags & ALLOC_HARDER)))
+	if (likely(!atomic))
 		free_pages -= z->nr_reserved_highatomic;
 	else
 		min -= min / 4;
@@ -2279,22 +2281,30 @@ static bool __zone_watermark_ok(struct zone *z, unsigned int order,
 #ifdef CONFIG_CMA
 	/* If allocation can't use CMA areas don't use free CMA pages */
 	if (!(alloc_flags & ALLOC_CMA))
-		free_cma = zone_page_state(z, NR_FREE_CMA_PAGES);
+		free_pages -= zone_page_state(z, NR_FREE_CMA_PAGES);
 #endif
 
-	if (free_pages - free_cma <= min + z->lowmem_reserve[classzone_idx])
+	if (free_pages <= min + z->lowmem_reserve[classzone_idx])
 		return false;
-	for (o = 0; o < order; o++) {
-		/* At the next order, this order's pages become unavailable */
-		free_pages -= z->free_area[o].nr_free << o;
 
-		/* Require fewer higher order pages to be free */
-		min >>= 1;
+	/* order-0 watermarks are ok */
+	if (!order)
+		return true;
+
+	/* Check at least one high-order page is free */
+	for (o = order; o < MAX_ORDER; o++) {
+		struct free_area *area = &z->free_area[o];
+		int mt;
+
+		if (atomic && area->nr_free)
+			return true;
 
-		if (free_pages <= min)
-			return false;
+		for (mt = 0; mt < MIGRATE_PCPTYPES; mt++) {
+			if (!list_empty(&area->free_list[mt]))
+				return true;
+		}
 	}
-	return true;
+	return false;
 }
 
 bool zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark,
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/