linux-kernel - Re: [patch 10/15] sched/migration: Move calc_load_migrate() into CPU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.11.1607121744350.4083@nanos>
Date:	Tue, 12 Jul 2016 18:33:56 +0200 (CEST)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	Anton Blanchard <anton@...ba.org>
cc:	LKML <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...nel.org>, rt@...utronix.de,
	Michael Ellerman <mpe@...erman.id.au>,
	Vaidyanathan Srinivasan <svaidy@...ux.vnet.ibm.com>,
	shreyas@...ux.vnet.ibm.com
Subject: Re: [patch 10/15] sched/migration: Move calc_load_migrate() into
 CPU_DYING

Anton,

On Tue, 12 Jul 2016, Anton Blanchard wrote:
> > It really does not matter when we fold the load for the outgoing cpu.
> > It's almost dead anyway, so there is no harm if we fail to fold the
> > few microseconds which are required for going fully away.
> 
> We are seeing the load average shoot up when hot unplugging CPUs (+1
> for every CPU we offline) on ppc64. This reproduces on bare metal as
> well as inside a KVM guest. A bisect points at this commit.
> 
> As an example, a completely idle box with 128 CPUS and 112 hot
> unplugged:
> 
> # uptime
>  04:35:30 up  1:23,  2 users,  load average: 112.43, 122.94, 125.54

Yes, it's an off by one as we now call that from the task which is tearing
down the cpu. Does the patch below fix it?

Thanks,

	tglx

8<----------------------

Subject: sched/migration: Correct off by one in load migration
From: Thomas Gleixner <tglx@...utronix.de>

The move of calc_load_migrate() from CPU_DEAD to CPU_DYING did not take into
account that the function is now called from a thread running on the outgoing
CPU. As a result a cpu unplug leakes a load of 1 into the global load
accounting mechanism.

Fix it by adjusting for the currently running thread which calls
calc_load_migrate().

Fixes: e9cd8fa4fcfd: "sched/migration: Move calc_load_migrate() into CPU_DYING"
Reported-by: Anton Blanchard <anton@...ba.org>
Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 51d7105f529a..97ee9ac7e97c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5394,13 +5394,15 @@ void idle_task_exit(void)
 /*
  * Since this CPU is going 'away' for a while, fold any nr_active delta
  * we might have. Assumes we're called after migrate_tasks() so that the
- * nr_active count is stable.
+ * nr_active count is stable. We need to take the teardown thread which
+ * is calling this into account, so we hand in adjust = 1 to the load
+ * calculation.
  *
  * Also see the comment "Global load-average calculations".
  */
 static void calc_load_migrate(struct rq *rq)
 {
-	long delta = calc_load_fold_active(rq);
+	long delta = calc_load_fold_active(rq, 1);
 	if (delta)
 		atomic_long_add(delta, &calc_load_tasks);
 }
diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c
index b0b93fd33af9..a2d6eb71f06b 100644
--- a/kernel/sched/loadavg.c
+++ b/kernel/sched/loadavg.c
@@ -78,11 +78,11 @@ void get_avenrun(unsigned long *loads, unsigned long offset, int shift)
 	loads[2] = (avenrun[2] + offset) << shift;
 }
 
-long calc_load_fold_active(struct rq *this_rq)
+long calc_load_fold_active(struct rq *this_rq, long adjust)
 {
 	long nr_active, delta = 0;
 
-	nr_active = this_rq->nr_running;
+	nr_active = this_rq->nr_running - adjust;
 	nr_active += (long)this_rq->nr_uninterruptible;
 
 	if (nr_active != this_rq->calc_load_active) {
@@ -188,7 +188,7 @@ void calc_load_enter_idle(void)
 	 * We're going into NOHZ mode, if there's any pending delta, fold it
 	 * into the pending idle delta.
 	 */
-	delta = calc_load_fold_active(this_rq);
+	delta = calc_load_fold_active(this_rq, 0);
 	if (delta) {
 		int idx = calc_load_write_idx();
 
@@ -389,7 +389,7 @@ void calc_global_load_tick(struct rq *this_rq)
 	if (time_before(jiffies, this_rq->calc_load_update))
 		return;
 
-	delta  = calc_load_fold_active(this_rq);
+	delta  = calc_load_fold_active(this_rq, 0);
 	if (delta)
 		atomic_long_add(delta, &calc_load_tasks);
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 7cbeb92a1cb9..898c0d2f18fe 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -28,7 +28,7 @@ extern unsigned long calc_load_update;
 extern atomic_long_t calc_load_tasks;
 
 extern void calc_global_load_tick(struct rq *this_rq);
-extern long calc_load_fold_active(struct rq *this_rq);
+extern long calc_load_fold_active(struct rq *this_rq, long adjust);
 
 #ifdef CONFIG_SMP
 extern void cpu_load_update_active(struct rq *this_rq);