linux-ext4 - Re: [PATCH 3/3] mm: slub: Default slub_max

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1305247626.2575.111.camel@mulgrave.site>
Date:	Thu, 12 May 2011 19:47:05 -0500
From:	James Bottomley <James.Bottomley@...senPartnership.com>
To:	Johannes Weiner <hannes@...xchg.org>
Cc:	Pekka Enberg <penberg@...nel.org>,
	Christoph Lameter <cl@...ux.com>, Mel Gorman <mgorman@...e.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Colin King <colin.king@...onical.com>,
	Raghavendra D Prabhu <raghu.prabhu13@...il.com>,
	Jan Kara <jack@...e.cz>, Chris Mason <chris.mason@...cle.com>,
	Rik van Riel <riel@...hat.com>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>,
	linux-mm <linux-mm@...ck.org>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	linux-ext4 <linux-ext4@...r.kernel.org>
Subject: Re: [PATCH 3/3] mm: slub: Default slub_max_order to 0

On Fri, 2011-05-13 at 00:15 +0200, Johannes Weiner wrote:
> On Thu, May 12, 2011 at 05:04:41PM -0500, James Bottomley wrote:
> > On Thu, 2011-05-12 at 15:04 -0500, James Bottomley wrote:
> > > Confirmed, I'm afraid ... I can trigger the problem with all three
> > > patches under PREEMPT.  It's not a hang this time, it's just kswapd
> > > taking 100% system time on 1 CPU and it won't calm down after I unload
> > > the system.
> > 
> > Just on a "if you don't know what's wrong poke about and see" basis, I
> > sliced out all the complex logic in sleeping_prematurely() and, as far
> > as I can tell, it cures the problem behaviour.  I've loaded up the
> > system, and taken the tar load generator through three runs without
> > producing a spinning kswapd (this is PREEMPT).  I'll try with a
> > non-PREEMPT kernel shortly.
> > 
> > What this seems to say is that there's a problem with the complex logic
> > in sleeping_prematurely().  I'm pretty sure hacking up
> > sleeping_prematurely() just to dump all the calculations is the wrong
> > thing to do, but perhaps someone can see what the right thing is ...
> 
> I think I see the problem: the boolean logic of sleeping_prematurely()
> is odd.  If it returns true, kswapd will keep running.  So if
> pgdat_balanced() returns true, kswapd should go to sleep.
> 
> This?

I was going to say this was a winner, but on the third untar run on
non-PREEMPT, I hit the kswapd livelock.  It's got much farther than
previous attempts, which all hang on the first run, but I think the
essential problem is still (at least on this machine) that
sleeping_prematurely() is doing too much work for the wakeup storm that
allocators are causing.

Something that ratelimits the amount of time we spend in the watermark
calculations, like the below (which incorporates your pgdat fix) seems
to be much more stable (I've not run it for three full runs yet, but
kswapd CPU time is way lower so far).

The heuristic here is that if we're making the calculation more than ten
times in 1/10 of a second, stop and sleep anyway.

James

---

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 0665520..545250c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2249,12 +2249,32 @@ static bool sleeping_prematurely(pg_data_t *pgdat, int order, long remaining,
 {
 	int i;
 	unsigned long balanced = 0;
-	bool all_zones_ok = true;
+	bool all_zones_ok = true, ret;
+	static int returned_true = 0;
+	static unsigned long prev_jiffies = 0;
+	
 
 	/* If a direct reclaimer woke kswapd within HZ/10, it's premature */
 	if (remaining)
 		return true;
 
+	/* rate limit our entry to the watermark calculations */
+	if (time_after(prev_jiffies + HZ/10, jiffies)) {
+		/* previously returned false, do so again */
+		if (returned_true == 0)
+			return false;
+		/* or we've done the true calculation too many times */
+		if (returned_true++ > 10)
+			return false;
+
+		return true;
+	} else {
+		/* haven't been here for a while, reset the true count */
+		returned_true = 0;
+	}
+
+	prev_jiffies = jiffies;
+
 	/* Check the watermark levels */
 	for (i = 0; i < pgdat->nr_zones; i++) {
 		struct zone *zone = pgdat->node_zones + i;
@@ -2286,9 +2306,16 @@ static bool sleeping_prematurely(pg_data_t *pgdat, int order, long remaining,
 	 * must be balanced
 	 */
 	if (order)
-		return pgdat_balanced(pgdat, balanced, classzone_idx);
+		ret = !pgdat_balanced(pgdat, balanced, classzone_idx);
+	else
+		ret = !all_zones_ok;
+
+	if (ret)
+		returned_true++;
 	else
-		return !all_zones_ok;
+		returned_true = 0;
+
+	return ret;
 }
 
 /*


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html