lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100208165555.GD23680@csn.ul.ie>
Date:	Mon, 8 Feb 2010 16:55:55 +0000
From:	Mel Gorman <mel@....ul.ie>
To:	Christian Ehrhardt <ehrhardt@...ux.vnet.ibm.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	epasch@...ibm.com, SCHILLIG@...ibm.com,
	Martin Schwidefsky <schwidefsky@...ibm.com>,
	Heiko Carstens <heiko.carstens@...ibm.com>,
	christof.schmitt@...ibm.com, thoss@...ibm.com, hare@...e.de,
	npiggin@...e.de
Subject: Re: Performance regression in scsi sequential throughput (iozone)
	due to "e084b - page-allocator: preserve PFN ordering when
	__GFP_COLD is set"

> <SNIP>
> The prototype patch for avoiding congestion_wait is below. I'll start
> work on a fallback-to-other-percpu-lists patch.
> 

And here is the prototype of the fallback-to-other-percpu-lists patch.
I'm afraid I've only managed to test it on qemu. My three test machines are
still occupied :(

==== CUT HERE ====
page allocator: Fallback to other per-cpu lists when the target list is empty and memory is low

When a per-cpu list of pages for a given migratetype is empty, the page
allocator is called to refill the PCP list. It's possible when memory
is low that this results in the process entering direct reclaim even
if it wasn't strictly necessary because there were pages free for other
migratetypes. Unconditionally falling back to other PCP lists hurts the
fragmentation-avoidance strategy which is also undesirable.

When the desired PCP list is empty, this patch checks how many free pages
there are on the PCP lists and if refilling the list could result in direct
reclaim. If direct reclaim is unlikely, the PCP list is refilled to maintain
fragmentation-avoidance. Otherwise, a page from an alternative PCP list is
chosen to maintain performance and avoid direct reclaim.

Signed-off-by: Mel Gorman <mel@....ul.ie>
---
 mm/page_alloc.c |   37 ++++++++++++++++++++++++++++++++++---
 1 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8deb9d0..009d683 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1168,6 +1168,39 @@ void split_page(struct page *page, unsigned int order)
 		set_page_refcounted(page + i);
 }
 
+/* Decide whether to find an alternative PCP list or refill */
+static struct list_head *pcp_fallback(struct zone *zone,
+			struct per_cpu_pages *pcp, 
+			int start_migratetype, int cold)
+{
+	int i;
+	int migratetype;
+	struct list_head *list;
+	long free_pages = zone_page_state(zone, NR_FREE_PAGES) - pcp->batch;
+
+	/* 
+	 * Find a PCPU list with free pages in the same order as
+	 * fragmentation-avoidance fallback in the event that refilling
+	 * the PCP list may result in direct reclaim
+	 */
+	if (pcp->count && free_pages <= low_wmark_pages(zone)) {
+		for (i = 0; i < MIGRATE_PCPTYPES - 1; i++) {
+			migratetype = fallbacks[start_migratetype][i];
+			list = &pcp->lists[migratetype];
+
+			if (!list_empty(list))
+				return list;
+		}
+	}
+
+	/* Alternatively, we need to allocate more memory to the PCP lists */
+	list = &pcp->lists[start_migratetype];
+	pcp->count += rmqueue_bulk(zone, 0, pcp->batch, list,
+					migratetype, cold);
+
+	return list;
+}
+
 /*
  * Really, prep_compound_page() should be called from __rmqueue_bulk().  But
  * we cheat by calling it from here, in the order > 0 path.  Saves a branch
@@ -1193,9 +1226,7 @@ again:
 		list = &pcp->lists[migratetype];
 		local_irq_save(flags);
 		if (list_empty(list)) {
-			pcp->count += rmqueue_bulk(zone, 0,
-					pcp->batch, list,
-					migratetype, cold);
+			list = pcp_fallback(zone, pcp, migratetype, cold);
 			if (unlikely(list_empty(list)))
 				goto failed;
 		}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ