lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.11.1603041539490.3686@nanos>
Date:	Fri, 4 Mar 2016 15:59:42 +0100 (CET)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	LKML <linux-kernel@...r.kernel.org>
cc:	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...nel.org>,
	Glauber Costa <glommer@...allels.com>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Rik van Riel <riel@...hat.com>
Subject: [PATCH] sched/cputime: Fix steal time accounting vs. cpu hotplug

On cpu hotplug the steal time accounting can keep a stale rq->prev_steal_time
value over cpu down and up. So after the cpu comes up again the delta
calculation in steal_account_process_tick() wreckages itself due to the
unsigned math:

	 u64 steal = paravirt_steal_clock(smp_processor_id());
	 
	 steal -= this_rq()->prev_steal_time;

So if steal is smaller than rq->prev_steal_time we end up with an insane large
value which then gets added to rq->prev_steal_time, resulting in a permanent
wreckage of the accounting. As a consequence the per cpu stats in /proc/stat
become stale.

Nice trick to tell the world how idle the system is (100%) while the cpu is
100% busy running tasks. Though we prefer realistic numbers.

None of the accounting values which use a previous value to account for
fractions is reset at cpu hotplug time. update_rq_clock_task() has a sanity
check for prev_irq_time and prev_steal_time_rq, but that sanity check solely
deals with clock warps and limits the /proc/stat visible wreckage. The
prev_time values are still wrong.

Solution is simple: Reset rq->prev_*_time when the cpu is plugged in again.

Fixes: commit e6e6685accfa "KVM guest: Steal time accounting"
Fixes: commit 095c0aa83e52 "sched: adjust scheduler cpu power for stolen time"
Fixes: commit aa483808516c "sched: Remove irq time from available CPU power"
Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
Cc: stable@...r.kernel.org

---
 kernel/sched/core.c  |    1 +
 kernel/sched/sched.h |   13 +++++++++++++
 2 files changed, 14 insertions(+)

Index: b/kernel/sched/core.c
===================================================================
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5627,6 +5627,7 @@ migration_call(struct notifier_block *nf
 
 	case CPU_UP_PREPARE:
 		rq->calc_load_update = calc_load_update;
+		account_reset_rq(rq);
 		break;
 
 	case CPU_ONLINE:
Index: b/kernel/sched/sched.h
===================================================================
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1738,3 +1738,16 @@ static inline u64 irq_time_read(int cpu)
 }
 #endif /* CONFIG_64BIT */
 #endif /* CONFIG_IRQ_TIME_ACCOUNTING */
+
+static inline void account_reset_rq(struct rq *rq)
+{
+#ifdef CONFIG_IRQ_TIME_ACCOUNTING
+	rq->prev_irq_time = 0;
+#endif
+#ifdef CONFIG_PARAVIRT
+	rq->prev_steal_time = 0;
+#endif
+#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
+	rq->prev_steal_time_rq = 0;
+#endif
+}

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ