lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1518972475-11340-2-git-send-email-robert.m.harris@oracle.com>
Date:   Sun, 18 Feb 2018 16:47:55 +0000
From:   robert.m.harris@...cle.com
To:     linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        linux-doc@...r.kernel.org
Cc:     Jonathan Corbet <corbet@....net>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Michal Hocko <mhocko@...e.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Kemi Wang <kemi.wang@...el.com>,
        David Rientjes <rientjes@...gle.com>,
        Yafang Shao <laoar.shao@...il.com>,
        Kangmin Park <l4stpr0gr4m@...il.com>,
        Mel Gorman <mgorman@...e.de>,
        Yisheng Xie <xieyisheng1@...wei.com>,
        Davidlohr Bueso <dave@...olabs.net>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Huang Ying <ying.huang@...el.com>,
        Vinayak Menon <vinmenon@...eaurora.org>,
        "Robert M. Harris" <robert.m.harris@...cle.com>
Subject: [PATCH 1/1] mm, compaction: correct the bounds of __fragmentation_index()

From: "Robert M. Harris" <robert.m.harris@...cle.com>

__fragmentation_index() calculates a value used to determine whether
compaction should be favoured over page reclaim in the event of allocation
failure.  The calculation itself is opaque and, on inspection, does not
match its existing description.  The function purports to return a value
between 0 and 1000, representing units of 1/1000.  Barring the case of a
pathological shortfall of memory, the lower bound is instead 500.  This is
significant because it is the default value of sysctl_extfrag_threshold,
i.e. the value below which compaction should be avoided in favour of page
reclaim for costly pages.

This patch implements and documents a modified version of the original
expression that returns a value in the range 0 <= index < 1000.  It amends
the default value of sysctl_extfrag_threshold to preserve the existing
behaviour.

Signed-off-by: Robert M. Harris <robert.m.harris@...cle.com>
---
 Documentation/sysctl/vm.txt |  2 +-
 mm/compaction.c             |  2 +-
 mm/vmstat.c                 | 47 +++++++++++++++++++++++++++++++++++----------
 3 files changed, 39 insertions(+), 12 deletions(-)

diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 5025ff9..384a78b 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -237,7 +237,7 @@ of memory, values towards 1000 imply failures are due to fragmentation and -1
 implies that the allocation will succeed as long as watermarks are met.
 
 The kernel will not compact memory in a zone if the
-fragmentation index is <= extfrag_threshold. The default value is 500.
+fragmentation index is <= extfrag_threshold. The default value is 0.
 
 ==============================================================
 
diff --git a/mm/compaction.c b/mm/compaction.c
index 10cd757..9db6ef4 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1730,7 +1730,7 @@ static enum compact_result compact_zone_order(struct zone *zone, int order,
 	return ret;
 }
 
-int sysctl_extfrag_threshold = 500;
+int sysctl_extfrag_threshold;
 
 /**
  * try_to_compact_pages - Direct compact to satisfy a high-order allocation
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 40b2db6..013f1af 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1044,15 +1044,22 @@ static void fill_contig_page_info(struct zone *zone,
 }
 
 /*
- * A fragmentation index only makes sense if an allocation of a requested
- * size would fail. If that is true, the fragmentation index indicates
- * whether external fragmentation or a lack of memory was the problem.
- * The value can be used to determine if page reclaim or compaction
- * should be used
+ * If there is no block of at least the requested size, implying that an
+ * allocation would fail, then it might be possible to conjure one by
+ * compaction.  As this is expensive it is reserved for those cases in which
+ * there is a relatively high degree of fragmentation.  For low degrees, page
+ * reclaim is more appropriate since an allocation failure is more likely to be
+ * caused by a lack of memory.
+ *
+ * This function calculates an index in the range 0 to 1, expressed in units of
+ * 1/1000, indicating low and high fragmentation respectively.  The special
+ * value of -1 indicates that free blocks of sufficient size are available and
+ * that an allocation should therefore succeed.
  */
 static int __fragmentation_index(unsigned int order, struct contig_page_info *info)
 {
 	unsigned long requested = 1UL << order;
+	int result;
 
 	if (WARN_ON_ONCE(order >= MAX_ORDER))
 		return 0;
@@ -1060,17 +1067,37 @@ static int __fragmentation_index(unsigned int order, struct contig_page_info *in
 	if (!info->free_blocks_total)
 		return 0;
 
-	/* Fragmentation index only makes sense when a request would fail */
 	if (info->free_blocks_suitable)
 		return -1000;
 
 	/*
-	 * Index is between 0 and 1 so return within 3 decimal places
+	 * If the number of requested-size blocks that could be constructed if
+	 * all free blocks were compacted is
+	 *
+	 *	B = info->free_pages/requested
+	 *
+	 * then, conceptually, the number of fragments into which each
+	 * requested-size block has been split is
+	 *
+	 *	N = info->free_blocks_total/B
 	 *
-	 * 0 => allocation would fail due to lack of memory
-	 * 1 => allocation would fail due to fragmentation
+	 * In the least and most fragmented cases all free memory resides on
+	 * either the order - 1 free list or the base page free list
+	 * respecively, thus the range of this function is given by
+	 * 2 <= N <= requested.  The fragmentation index,
+	 *
+	 *	F = 1 - 2/N,
+	 *
+	 * has the more useful range of 0 < F <= 1.  In order to inhibit
+	 * compaction in the event of a pathological shortfall of memory this
+	 * function truncates and returns
+	 *
+	 *	F - 1/info->free_blocks_total
 	 */
-	return 1000 - div_u64( (1000+(div_u64(info->free_pages * 1000ULL, requested))), info->free_blocks_total);
+	result = 1000 - div_u64((1000 + (div_u64(info->free_pages * 2000ULL,
+			requested))), info->free_blocks_total);
+
+	return (result < 0) ? 0 : result;
 }
 
 /* Same as __fragmentation index but allocs contig_page_info on stack */
-- 
1.8.3.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ