[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <tip-e9532e69b8d1d1284e8ecf8d2586de34aec61244@git.kernel.org>
Date: Sat, 5 Mar 2016 03:27:38 -0800
From: tip-bot for Thomas Gleixner <tipbot@...or.com>
To: linux-tip-commits@...r.kernel.org
Cc: glommer@...allels.com, fweisbec@...il.com, peterz@...radead.org,
linux-kernel@...r.kernel.org, hpa@...or.com, mingo@...nel.org,
torvalds@...ux-foundation.org, stable@...r.kernel.org,
riel@...hat.com, tglx@...utronix.de
Subject: [tip:sched/urgent] sched/cputime: Fix steal time accounting vs. CPU
hotplug
Commit-ID: e9532e69b8d1d1284e8ecf8d2586de34aec61244
Gitweb: http://git.kernel.org/tip/e9532e69b8d1d1284e8ecf8d2586de34aec61244
Author: Thomas Gleixner <tglx@...utronix.de>
AuthorDate: Fri, 4 Mar 2016 15:59:42 +0100
Committer: Ingo Molnar <mingo@...nel.org>
CommitDate: Sat, 5 Mar 2016 09:17:20 +0100
sched/cputime: Fix steal time accounting vs. CPU hotplug
On CPU hotplug the steal time accounting can keep a stale rq->prev_steal_time
value over CPU down and up. So after the CPU comes up again the delta
calculation in steal_account_process_tick() wreckages itself due to the
unsigned math:
u64 steal = paravirt_steal_clock(smp_processor_id());
steal -= this_rq()->prev_steal_time;
So if steal is smaller than rq->prev_steal_time we end up with an insane large
value which then gets added to rq->prev_steal_time, resulting in a permanent
wreckage of the accounting. As a consequence the per CPU stats in /proc/stat
become stale.
Nice trick to tell the world how idle the system is (100%) while the CPU is
100% busy running tasks. Though we prefer realistic numbers.
None of the accounting values which use a previous value to account for
fractions is reset at CPU hotplug time. update_rq_clock_task() has a sanity
check for prev_irq_time and prev_steal_time_rq, but that sanity check solely
deals with clock warps and limits the /proc/stat visible wreckage. The
prev_time values are still wrong.
Solution is simple: Reset rq->prev_*_time when the CPU is plugged in again.
Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
Acked-by: Rik van Riel <riel@...hat.com>
Cc: <stable@...r.kernel.org>
Cc: Frederic Weisbecker <fweisbec@...il.com>
Cc: Glauber Costa <glommer@...allels.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Peter Zijlstra <peterz@...radead.org>
Fixes: commit 095c0aa83e52 "sched: adjust scheduler cpu power for stolen time"
Fixes: commit aa483808516c "sched: Remove irq time from available CPU power"
Fixes: commit e6e6685accfa "KVM guest: Steal time accounting"
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1603041539490.3686@nanos
Signed-off-by: Ingo Molnar <mingo@...nel.org>
---
kernel/sched/core.c | 1 +
kernel/sched/sched.h | 13 +++++++++++++
2 files changed, 14 insertions(+)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ab814bf..406182a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5627,6 +5627,7 @@ migration_call(struct notifier_block *nfb, unsigned long action, void *hcpu)
case CPU_UP_PREPARE:
rq->calc_load_update = calc_load_update;
+ account_reset_rq(rq);
break;
case CPU_ONLINE:
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 30ea2d8..4f6598a 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1738,3 +1738,16 @@ static inline u64 irq_time_read(int cpu)
}
#endif /* CONFIG_64BIT */
#endif /* CONFIG_IRQ_TIME_ACCOUNTING */
+
+static inline void account_reset_rq(struct rq *rq)
+{
+#ifdef CONFIG_IRQ_TIME_ACCOUNTING
+ rq->prev_irq_time = 0;
+#endif
+#ifdef CONFIG_PARAVIRT
+ rq->prev_steal_time = 0;
+#endif
+#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
+ rq->prev_steal_time_rq = 0;
+#endif
+}
Powered by blists - more mailing lists