lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20130108224313.GA13304@suse.de>
Date:	Tue, 8 Jan 2013 22:43:13 +0000
From:	Mel Gorman <mgorman@...e.de>
To:	Eric Wong <normalperson@...t.net>
Cc:	linux-mm@...ck.org, netdev@...r.kernel.org,
	linux-kernel@...r.kernel.org, Rik van Riel <riel@...hat.com>,
	Minchan Kim <minchan@...nel.org>,
	Eric Dumazet <eric.dumazet@...il.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: ppoll() stuck on POLLIN while TCP peer is sending

On Mon, Jan 07, 2013 at 10:38:50PM +0000, Eric Wong wrote:
> Mel Gorman <mgorman@...e.de> wrote:
> > Right now it's difficult to see how the capture could be the source of
> > this bug but I'm not ruling it out either so try the following (untested
> > but should be ok) patch.  It's not a proper revert, it just disables the
> > capture page logic to see if it's at fault.
> 
> Things look good so far with your change.

Ok, so minimally reverting is an option once 2e30abd1 is preserved. The
original motivation for the patch was to improve allocation success rates
under load but due to a bug in the patch the likely source of the improvement
was due to compacting more for THP allocations.

> It's been running 2 hours on a VM and 1 hour on my regular machine.
> Will update again in a few hours (or sooner if it's stuck again).

When I looked at it for long enough I found a number of problems. Most
affect timing but two serious issues are in there. One affects how long
kswapd spends compacting versus reclaiming and the other increases lock
contention meaning that async compaction can abort early. Both are serious
and could explain why a driver would fail high-order allocations.

Please try the following patch. However, even if it works the benefit of
capture may be so marginal that partially reverting it and simplifying
compaction.c is the better decision.

diff --git a/mm/compaction.c b/mm/compaction.c
index 6b807e4..03c82c0 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -857,7 +857,8 @@ static int compact_finished(struct zone *zone,
 	} else {
 		unsigned int order;
 		for (order = cc->order; order < MAX_ORDER; order++) {
-			struct free_area *area = &zone->free_area[cc->order];
+			struct free_area *area = &zone->free_area[order];
+
 			/* Job done if page is free of the right migratetype */
 			if (!list_empty(&area->free_list[cc->migratetype]))
 				return COMPACT_PARTIAL;
@@ -929,6 +930,11 @@ static void compact_capture_page(struct compact_control *cc)
 	if (!cc->page || *cc->page)
 		return;
 
+	/* Check that watermarks are satisifed before acquiring locks */
+	if (!zone_watermark_ok(cc->zone, cc->order, low_wmark_pages(cc->zone),
+									0, 0))
+		return;
+
 	/*
 	 * For MIGRATE_MOVABLE allocations we capture a suitable page ASAP
 	 * regardless of the migratetype of the freelist is is captured from.
@@ -941,7 +947,7 @@ static void compact_capture_page(struct compact_control *cc)
 	 */
 	if (cc->migratetype == MIGRATE_MOVABLE) {
 		mtype_low = 0;
-		mtype_high = MIGRATE_PCPTYPES;
+		mtype_high = MIGRATE_PCPTYPES + 1;
 	} else {
 		mtype_low = cc->migratetype;
 		mtype_high = cc->migratetype + 1;
@@ -1118,7 +1124,6 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
 	struct zoneref *z;
 	struct zone *zone;
 	int rc = COMPACT_SKIPPED;
-	int alloc_flags = 0;
 
 	/* Check if the GFP flags allow compaction */
 	if (!order || !may_enter_fs || !may_perform_io)
@@ -1126,10 +1131,6 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
 
 	count_compact_event(COMPACTSTALL);
 
-#ifdef CONFIG_CMA
-	if (allocflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE)
-		alloc_flags |= ALLOC_CMA;
-#endif
 	/* Compact each zone in the list */
 	for_each_zone_zonelist_nodemask(zone, z, zonelist, high_zoneidx,
 								nodemask) {
@@ -1139,9 +1140,8 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
 						contended, page);
 		rc = max(status, rc);
 
-		/* If a normal allocation would succeed, stop compacting */
-		if (zone_watermark_ok(zone, order, low_wmark_pages(zone), 0,
-				      alloc_flags))
+		/* If a page was captured, stop compacting */
+		if (*page)
 			break;
 	}
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4ba5e37..9d20c13 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2180,10 +2180,8 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 	current->flags &= ~PF_MEMALLOC;
 
 	/* If compaction captured a page, prep and use it */
-	if (page) {
-		prep_new_page(page, order, gfp_mask);
+	if (page && !prep_new_page(page, order, gfp_mask))
 		goto got_page;
-	}
 
 	if (*did_some_progress != COMPACT_SKIPPED) {
 		/* Page migration frees to the PCP lists but we want merging */
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ