linux-kernel - [PATCH] time,virt: resync steal time when guest & host lose sync

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 10 Aug 2016 12:52:12 -0400
From:	Rik van Riel <riel@...hat.com>
To:	Wanpeng Li <kernellwp@...il.com>
Cc:	Frederic Weisbecker <fweisbec@...il.com>,
	Ingo Molnar <mingo@...nel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Paolo Bonzini <pbonzini@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Wanpeng Li <wanpeng.li@...mail.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Radim Krcmar <rkrcmar@...hat.com>,
	Mike Galbraith <efault@....de>
Subject: [PATCH] time,virt: resync steal time when guest & host lose sync

On Wed, 10 Aug 2016 07:39:08 +0800
Wanpeng Li <kernellwp@...il.com> wrote:

> The regression is caused by your commit "sched,time: Count actually
> elapsed irq & softirq time".

Wanpeng, does this patch fix your issue?

Paolo, what is your opinion on this issue?

I can think of all kinds of ways in which guest and host might lose
sync with steal time, from uninitialized values at boot, to guest
pause, followed by save to disk, and reload, to live migration, to...

---8<---

Subject: time,virt: resync steal time when guest & host lose sync

When guest and host wildly disagree on steal time, a guest can
do several things:
1) Quickly account all the steal time at once (the kernel did this before
   57430218317e ("sched/cputime: Count actually elapsed irq & softirq time"),
   when steal_account_process_ticks got ULONG_MAX as its maximum value.
2) Stay out of sync for an indeterminate amount of time. This is what the
   system does today.
3) Sync up the guest value to the host-provided value, without accounting
   an absurdly large value in the cpu time statistics.

This patch makes the kernel do (3), which seems like the right thing
to do.

The exact value of the threshold use probably does not matter too much,
as long as it is long enough to cover all the timer ticks that passed
during an idle period, because (irqtime_)account_idle_ticks can process
a large amount of time all at once.

Signed-off-by: Rik van Riel <riel@...hat.com>
Reported-by: Wanpeng Li <kernellwp@...il.com>
---
 kernel/sched/cputime.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 1934f658c036..c18f9e717af6 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -273,7 +273,17 @@ static __always_inline cputime_t steal_account_process_time(cputime_t maxtime)
 		steal = paravirt_steal_clock(smp_processor_id());
 		steal -= this_rq()->prev_steal_time;

-		steal_cputime = min(nsecs_to_cputime(steal), maxtime);
+		steal_cputime = nsecs_to_cputime(steal);
+		if (steal_cputime > 32 * maxtime) {
+			/*
+			 * Guest and host steal time values are way out of
+			 * sync. Sync up the guest steal time with the host.
+			 */
+			this_rq()->prev_steal_time +=
+					cputime_to_nsecs(steal_cputime);
+			return 0;
+		}
+		steal_cputime = min(steal_cputime, maxtime);
 		account_steal_time(steal_cputime);
 		this_rq()->prev_steal_time += cputime_to_nsecs(steal_cputime);