linux-kernel - Re: CFS review

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20070812155242.GA1977@elte.hu>
Date:	Sun, 12 Aug 2007 17:52:44 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Al Boldi <a1426z@...ab.com>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Mike Galbraith <efault@....de>,
	Roman Zippel <zippel@...ux-m68k.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org
Subject: Re: CFS review


* Al Boldi <a1426z@...ab.com> wrote:

> > so could you please re-check chew jitter behavior with the latest 
> > kernel? (i've attached the standalone patch below, it will apply 
> > cleanly to rc2 too.)
> 
> That fixes it, but by reducing granularity ctx is up 4-fold.

ok, great! (the context-switch rate is obviously up.)

> Mind you, it does have an enormous effect on responsiveness, as 
> negative nice with small granularity can't hijack the system any more.

ok. i'm glad you like the result :-) This makes reniced X (or any 
reniced app) more usable.

> The thing is, this unpredictability seems to exist even at nice level 
> 0, but the smaller granularity covers it all up.  It occasionally 
> exhibits itself as hick-ups during transient heavy workload flux.  But 
> it's not easily reproducible.

In general, "hickups" can be due to many, many reasons. If a task got 
indeed delayed by scheduling jitter that is provable, even if the 
behavior is hard to reproduce, by enabling CONFIG_SCHED_DEBUG=y and 
CONFIG_SCHEDSTATS=y in your kernel. First clear all the stats:

  for N in /proc/*/task/*/sched; do echo 0 > $N; done

then wait for the 'hickup' to happen, and once it happens capture the 
system state (after the hickup) via this script:

  http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh

and tell me which specific task exhibited that 'hickup' and send me the 
debug output. Also, could you try the patch below as well? Thanks,

	Ingo

-------------------------------->
Subject: sched: fix sleeper bonus
From: Ingo Molnar <mingo@...e.hu>

Peter Ziljstra noticed that the sleeper bonus deduction code
was not properly rate-limited: a task that scheduled more
frequently would get a disproportionately large deduction.
So limit the deduction to delta_exec.

Signed-off-by: Ingo Molnar <mingo@...e.hu>
---
 kernel/sched_fair.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

Index: linux/kernel/sched_fair.c
===================================================================
--- linux.orig/kernel/sched_fair.c
+++ linux/kernel/sched_fair.c
@@ -75,7 +75,7 @@ enum {
 
 unsigned int sysctl_sched_features __read_mostly =
 		SCHED_FEAT_FAIR_SLEEPERS	*1 |
-		SCHED_FEAT_SLEEPER_AVG		*1 |
+		SCHED_FEAT_SLEEPER_AVG		*0 |
 		SCHED_FEAT_SLEEPER_LOAD_AVG	*1 |
 		SCHED_FEAT_PRECISE_CPU_LOAD	*1 |
 		SCHED_FEAT_START_DEBIT		*1 |
@@ -304,11 +304,9 @@ __update_curr(struct cfs_rq *cfs_rq, str
 	delta_mine = calc_delta_mine(delta_exec, curr->load.weight, lw);
 
 	if (cfs_rq->sleeper_bonus > sysctl_sched_granularity) {
-		delta = calc_delta_mine(cfs_rq->sleeper_bonus,
-					curr->load.weight, lw);
-		if (unlikely(delta > cfs_rq->sleeper_bonus))
-			delta = cfs_rq->sleeper_bonus;
-
+		delta = min(cfs_rq->sleeper_bonus, (u64)delta_exec);
+		delta = calc_delta_mine(delta, curr->load.weight, lw);
+		delta = min((u64)delta, cfs_rq->sleeper_bonus);
 		cfs_rq->sleeper_bonus -= delta;
 		delta_mine -= delta;
 	}
@@ -521,6 +519,8 @@ static void __enqueue_sleeper(struct cfs
 	 * Track the amount of bonus we've given to sleepers:
 	 */
 	cfs_rq->sleeper_bonus += delta_fair;
+	if (unlikely(cfs_rq->sleeper_bonus > sysctl_sched_runtime_limit))
+		cfs_rq->sleeper_bonus = sysctl_sched_runtime_limit;
 
 	schedstat_add(cfs_rq, wait_runtime, se->wait_runtime);
 }
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/