linux-kernel - [RFC PATCH] nohz/sched: disable ilb on !mc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20100408195941.GA5040@comet.dominikbrodowski.net>
Date:	Thu, 8 Apr 2010 21:59:41 +0200
From:	Dominik Brodowski <linux@...inikbrodowski.net>
To:	linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...e.hu>,
	Peter Zijlstra <peterz@...radead.org>,
	Arjan van de Ven <arjan@...ux.intel.com>
Subject: [RFC PATCH] nohz/sched: disable ilb on !mc_capable()

On Sun, Apr 04, 2010 at 12:33:28AM +0200, Dominik Brodowski wrote:
>
> 2) dual-core CPU[*] and select_nohz_load_balancer()
> [*] (Intel(R) Core(TM)2 Duo CPU T7250)
> 
> # CONFIG_SCHED_SMT is not set
> CONFIG_SCHED_MC=y
> CONFIG_SCHED_HRTICK=y
> 
> CONFIG_SCHED_MC is igored, as mc_capable() returns 0 on a one-socket,
> dual-core system. Quite surprisingly, even under moderate load (~98.0% idle)
> while writing this bugreport, up to half of the calls to
> tick_nohz_stop_sched_tick() are aborted due to select_nohz_load_balancer(1):
> 
> 		if (atomic_read(&nohz.load_balancer) == -1) {
> 			/* make me the ilb owner */
> 			if (atomic_cmpxchg(&nohz.load_balancer, -1, cpu) == -1)
> 				return 1;
> 
> I'm not really sure, but I guess this is caused by the following phenomenon
> under minor load but still, every once in a while, parallel work for both
> CPUs:
> 
> CPU #0					CPU #1
> 
> <active>				<active>
> <idle>					<active>
>   tick_nohz_stop_sched_tick(1)		<active>
>    select_nohz_load_balancer(1)		<active>
>     => becomes ilb owner		<idle>
>    => tick is not stopped		 tick_nohz_stop_sched_tick(1)
>   => CPU goes to sleep for 1 tick	  => as it isn't the ILB owner, tick
>   <sleep for 1 tick>			     is stopped	.
>   ---> scheduler_tick()			  <sleeeeeeeep>
>   tick_nohz_stop_sched_tick(0)
> <still idle>
>   tick_nohz_stop_sched_tick(1)
>    select_nohz_load_balancer(1)
>     => is ilb owner, all CPUs idle,
>        may go to sleep.
> 
> If both CPUs have hardly anything to do, letting the _active_ CPU do ilb
> allows us to enter deep sleep states earlier, and longer:
> 
> current ILB model (* = ILB)
> 
> 	tick ---------- tick -------- tick ----- IRQ
> CPU0:   active|IDLE(C2)--|*|IDLE (C3)             |
> CPU1:   active....| IDLE (C3)                     |
> core:   .......???| C2   |           C3           |
> 
> ILB-by-active-CPU-on-light-load:
> 
> 	tick ---------- tick -------- tick ----- IRQ
> CPU0:   active|IDLE(C3)                           |
> CPU1:   active....*| IDLE (C3)                    |
> core:   .......????|               C3             |

Tested this a bit further, and thought about it a bit further:

On systems like my laptop, which has one physical CPUs with two cores
( = SMP, !mc_capable() ), the "idle load balancing" seems to be _not_
necessary at all:

- if both cores are active, ilb is inactive anyway.

- if no core is active, ilb was inactive anyway

- if only one core is active and busy, it seems to attempt to balance its
  load on each tick anyway. ilb wouldn't act quicker anyways.

The attached patch decreases the amount of wakeups on my completely idle
notebook ( init=/bin/bash ) from ~2 wakeups-per-second[*] to ~0.7. During
normal system usage, the amount of wakeups-per-second seems to decrease as
well, but is less easy to detect. More importantly, over 80 % of all calls
to tick_nohz_stop_sched_tick() succeed immediately[**].

[*] needs an USB-autosuspend bugfix, manual enabling of USB autosuspend, and
    disabling of the blinking fb cursor.

[**] about 10% return due to rcu_needs_cpu(), which often means the CPU can
    go to sleep pretty soon afterwards.

The remaining reports of "tick_sched_timer" in powertop(1) seems to be
related to timer ticks when one CPU is active for at least one jiffy. So
this is probably not a real "wakeup" at all.

Best,
	Dominik


From: Dominik Brodowski <linux@...inikbrodowski.net>
Date: Thu, 8 Apr 2010 21:51:18 +0200
Subject: [PATCH] nohz/sched: disable ilb on !mc_capable()

Signed-off-by: Dominik Brodowski <linux@...inikbrodowski.net>

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index 5a5ea2c..8ad8a03 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -3290,6 +3290,9 @@ int select_nohz_load_balancer(int stop_tick)
 	if (stop_tick) {
 		cpu_rq(cpu)->in_nohz_recently = 1;
 
+		if (!mc_capable())
+			return 0;
+
 		if (!cpu_active(cpu)) {
 			if (atomic_read(&nohz.load_balancer) != cpu)
 				return 0;
@@ -3339,6 +3342,9 @@ int select_nohz_load_balancer(int stop_tick)
 		if (!cpumask_test_cpu(cpu, nohz.cpu_mask))
 			return 0;
 
+		if (!mc_capable())
+			return 0;
+
 		cpumask_clear_cpu(cpu, nohz.cpu_mask);
 
 		if (atomic_read(&nohz.load_balancer) == cpu)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/