lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1305247626.2575.111.camel@mulgrave.site>
Date:	Thu, 12 May 2011 19:47:05 -0500
From:	James Bottomley <James.Bottomley@...senPartnership.com>
To:	Johannes Weiner <hannes@...xchg.org>
Cc:	Pekka Enberg <penberg@...nel.org>,
	Christoph Lameter <cl@...ux.com>, Mel Gorman <mgorman@...e.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Colin King <colin.king@...onical.com>,
	Raghavendra D Prabhu <raghu.prabhu13@...il.com>,
	Jan Kara <jack@...e.cz>, Chris Mason <chris.mason@...cle.com>,
	Rik van Riel <riel@...hat.com>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>,
	linux-mm <linux-mm@...ck.org>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	linux-ext4 <linux-ext4@...r.kernel.org>
Subject: Re: [PATCH 3/3] mm: slub: Default slub_max_order to 0

On Fri, 2011-05-13 at 00:15 +0200, Johannes Weiner wrote:
> On Thu, May 12, 2011 at 05:04:41PM -0500, James Bottomley wrote:
> > On Thu, 2011-05-12 at 15:04 -0500, James Bottomley wrote:
> > > Confirmed, I'm afraid ... I can trigger the problem with all three
> > > patches under PREEMPT.  It's not a hang this time, it's just kswapd
> > > taking 100% system time on 1 CPU and it won't calm down after I unload
> > > the system.
> > 
> > Just on a "if you don't know what's wrong poke about and see" basis, I
> > sliced out all the complex logic in sleeping_prematurely() and, as far
> > as I can tell, it cures the problem behaviour.  I've loaded up the
> > system, and taken the tar load generator through three runs without
> > producing a spinning kswapd (this is PREEMPT).  I'll try with a
> > non-PREEMPT kernel shortly.
> > 
> > What this seems to say is that there's a problem with the complex logic
> > in sleeping_prematurely().  I'm pretty sure hacking up
> > sleeping_prematurely() just to dump all the calculations is the wrong
> > thing to do, but perhaps someone can see what the right thing is ...
> 
> I think I see the problem: the boolean logic of sleeping_prematurely()
> is odd.  If it returns true, kswapd will keep running.  So if
> pgdat_balanced() returns true, kswapd should go to sleep.
> 
> This?

I was going to say this was a winner, but on the third untar run on
non-PREEMPT, I hit the kswapd livelock.  It's got much farther than
previous attempts, which all hang on the first run, but I think the
essential problem is still (at least on this machine) that
sleeping_prematurely() is doing too much work for the wakeup storm that
allocators are causing.

Something that ratelimits the amount of time we spend in the watermark
calculations, like the below (which incorporates your pgdat fix) seems
to be much more stable (I've not run it for three full runs yet, but
kswapd CPU time is way lower so far).

The heuristic here is that if we're making the calculation more than ten
times in 1/10 of a second, stop and sleep anyway.

James

---

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 0665520..545250c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2249,12 +2249,32 @@ static bool sleeping_prematurely(pg_data_t *pgdat, int order, long remaining,
 {
 	int i;
 	unsigned long balanced = 0;
-	bool all_zones_ok = true;
+	bool all_zones_ok = true, ret;
+	static int returned_true = 0;
+	static unsigned long prev_jiffies = 0;
+	
 
 	/* If a direct reclaimer woke kswapd within HZ/10, it's premature */
 	if (remaining)
 		return true;
 
+	/* rate limit our entry to the watermark calculations */
+	if (time_after(prev_jiffies + HZ/10, jiffies)) {
+		/* previously returned false, do so again */
+		if (returned_true == 0)
+			return false;
+		/* or we've done the true calculation too many times */
+		if (returned_true++ > 10)
+			return false;
+
+		return true;
+	} else {
+		/* haven't been here for a while, reset the true count */
+		returned_true = 0;
+	}
+
+	prev_jiffies = jiffies;
+
 	/* Check the watermark levels */
 	for (i = 0; i < pgdat->nr_zones; i++) {
 		struct zone *zone = pgdat->node_zones + i;
@@ -2286,9 +2306,16 @@ static bool sleeping_prematurely(pg_data_t *pgdat, int order, long remaining,
 	 * must be balanced
 	 */
 	if (order)
-		return pgdat_balanced(pgdat, balanced, classzone_idx);
+		ret = !pgdat_balanced(pgdat, balanced, classzone_idx);
+	else
+		ret = !all_zones_ok;
+
+	if (ret)
+		returned_true++;
 	else
-		return !all_zones_ok;
+		returned_true = 0;
+
+	return ret;
 }
 
 /*


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ