lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 02 Jun 2016 09:59:48 -0400
From:	Rik van Riel <riel@...hat.com>
To:	Peter Zijlstra <peterz@...radead.org>,
	Wanpeng Li <kernellwp@...il.com>
Cc:	linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
	Wanpeng Li <wanpeng.li@...mail.com>,
	Ingo Molnar <mingo@...nel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Paolo Bonzini <pbonzini@...hat.com>, Radim <rkrcmar@...hat.com>
Subject: Re: [PATCH] sched/cputime: add steal clock warps handling during
 cpu hotplug

On Thu, 2016-06-02 at 14:00 +0200, Peter Zijlstra wrote:
> On Thu, Jun 02, 2016 at 07:57:19PM +0800, Wanpeng Li wrote:
> > 
> > From: Wanpeng Li <wanpeng.li@...mail.com>
> > 
> > I observed that sometimes st is 100% instantaneous, then idle is
> > 100% 
> > even if there is a cpu hog on the guest cpu after the cpu hotplug
> > comes 
> > back(N.B. both guest and host are latest 4.7-rc1, this can not
> > always 
> > be readily reproduced). I add trace to capture it as below:
> > 
> > cpuhp/1-12    [001] d.h1   167.461657: account_process_tick: steal
> > = 1291385514, prev_steal_time = 0         
> > cpuhp/1-12    [001] d.h1   167.461659: account_process_tick:
> > steal_jiffies = 1291          
> > <idle>-0     [001] d.h1   167.462663: account_process_tick: steal =
> > 18732255, prev_steal_time = 1291000000          
> > <idle>-0     [001] d.h1   167.462664: account_process_tick:
> > steal_jiffies = 18446744072437
> > 
> > The steal clock warps and then steal_jiffies overflow, this patch
> > align 
> > prev_steal_time to the new steal clock timestamp, in order to
> > avoid 
> > overflow and st stuff can continue to work.
> I would rather suggest fixing the steal clock thing to not jump like
> that; is that at all possible?

Not always possible, I suspect.

If a guest is saved to disk and later restored (eg. after
a host reboot), or live migrated to another host, I would
expect to get totally disjoint steal time statistics from
the "new run" of the guest (which is the same run of the
guest OS).

In fact, this code may also need to deal with the case
where steal time suddenly increases by a ludicrous amount,
and ignore those events, too.

A safe threshold might be to only apply steal times that
are positive and smaller than one second (as long as nohz_full
has the one second timer tick left), ignoring intervals that
are negative or longer than a second, and using those to sync
up the guest with the host.

-- 
All Rights Reversed.


Download attachment "signature.asc" of type "application/pgp-signature" (474 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ