lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20100418062723.GA19747@Krystal>
Date:	Sun, 18 Apr 2010 02:27:23 -0400
From:	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:	Ingo Molnar <mingo@...e.hu>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Mike Galbraith <efault@....de>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Greg Kroah-Hartman <greg@...ah.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Jarkko Nikula <jhnikula@...il.com>,
	Tony Lindgren <tony@...mide.com>, linux-kernel@...r.kernel.org
Subject: [RFC patch] CFS fix place entity spread issue

Huge CFS vruntime spread (18 minutes) has been observed with LTTng while simply
running Xorg on a uniprocessor machine, 2.6.33.2 kernel. Detailed explanation in
my ELC2010 presentation at:

http://www.efficios.com/elc2010

(includes slides, ad-hoc CFS instrumentation patches and wakeup latency test
program)

I've torn the CFS scheduler apart in the past days to figure out what is causing
this weird behavior, and the culprit seems to be place_entity(). The problem
appears to be the cumulative effect of letting the min_vruntime go backward when
putting sleepers back on the runqueue. It lets the vruntime spread grow to
"entertaining" values (it is supposed to be in the 5ms range, not 18 minutes!).

In the original code, a max between the sched entity vruntime and the calculated
vruntime was supposed to "ensure that the thread time never go backward". But I
don't see why we even care about that. The key point is that the min_vruntime
of the runqueue should not go backward.

I propose to fix this by calculating the relative offset from
min_vruntime + sysctl_sched_latency rather than directly from min_vruntime. I
also ensure that the value never goes below min_vruntime.

Under the Xorg workload, moving a few windows around and starting firefox while
executing the wakeup-latency.c program (program waking up every 10ms and
reporting wakeup latency), this patch brings worse latency from 60ms down to
12ms. Even doing a kernel compilation at the same time, the worse latency stays
around 20ms now.

I'm submitting this patch ASAP, since it seems to fix CFS issues that many
people have been complaining about. I'm sending it as RFC because testing its
effect on more workloads would be welcome.

I can see that place_entity() has stayed more or less the same since 2.6.24 (and
maybe even before, as code has just been reorganised between 2.6.23 and 2.6.24),
so we can expect this to be a problem people have been experiencing for a while.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
CC: Ingo Molnar <mingo@...e.hu>
CC: Peter Zijlstra <a.p.zijlstra@...llo.nl>
CC: Mike Galbraith <efault@....de>
CC: Andrew Morton <akpm@...ux-foundation.org>
CC: Linus Torvalds <torvalds@...ux-foundation.org>
CC: Greg Kroah-Hartman <greg@...ah.com>
CC: Steven Rostedt <rostedt@...dmis.org>
CC: Jarkko Nikula <jhnikula@...il.com>
CC: Tony Lindgren <tony@...mide.com>
---
 kernel/sched_fair.c |   13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

Index: linux-2.6-lttng.git/kernel/sched_fair.c
===================================================================
--- linux-2.6-lttng.git.orig/kernel/sched_fair.c	2010-04-18 01:44:19.000000000 -0400
+++ linux-2.6-lttng.git/kernel/sched_fair.c	2010-04-18 01:47:38.000000000 -0400
@@ -738,6 +738,14 @@
 		unsigned long thresh = sysctl_sched_latency;
 
 		/*
+		 * Place the woken up task relative to
+		 * min_vruntime + sysctl_sched_latency.
+		 * We must _never_ decrement min_vruntime, because the effect is
+		 * that spread increases progressively under the Xorg workload.
+		 */
+		vruntime += sysctl_sched_latency;
+
+		/*
 		 * Convert the sleeper threshold into virtual time.
 		 * SCHED_IDLE is a special sub-class.  We care about
 		 * fairness only relative to other SCHED_IDLE tasks,
@@ -755,11 +763,10 @@
 			thresh >>= 1;
 
 		vruntime -= thresh;
+		/* Ensure min_vruntime never go backwards. */
+		vruntime = max_t(u64, vruntime, cfs_rq->min_vruntime);
 	}
 
-	/* ensure we never gain time by being placed backwards. */
-	vruntime = max_vruntime(se->vruntime, vruntime);
-
 	se->vruntime = vruntime;
 }
 
-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ