lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 10 Aug 2016 01:07:41 -0400
From:	Rik van Riel <riel@...hat.com>
To:	Wanpeng Li <kernellwp@...il.com>
Cc:	Frederic Weisbecker <fweisbec@...il.com>,
	Ingo Molnar <mingo@...nel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Paolo Bonzini <pbonzini@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Wanpeng Li <wanpeng.li@...mail.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Radim Krcmar <rkrcmar@...hat.com>,
	Mike Galbraith <efault@....de>
Subject: Re: [PATCH 1/5] sched,time: Count actually elapsed irq & softirq
 time

On Wed, 2016-08-10 at 07:39 +0800, Wanpeng Li wrote:
> 2016-08-10 7:25 GMT+08:00 Wanpeng Li <kernellwp@...il.com>:
> > 2016-08-09 22:06 GMT+08:00 Rik van Riel <riel@...hat.com>:
> > > On Tue, 2016-08-09 at 11:59 +0800, Wanpeng Li wrote:
> > > > Hi Rik,
> > > > 2016-07-13 22:50 GMT+08:00 Frederic Weisbecker <fweisbec@...il.
> > > > com>:
> > > > > From: Rik van Riel <riel@...hat.com>
> > > > > 
> > > > > Currently, if there was any irq or softirq time during
> > > > > 'ticks'
> > > > > jiffies, the entire period will be accounted as irq or
> > > > > softirq
> > > > > time.
> > > > > 
> > > > > This is inaccurate if only a subset of the time was actually
> > > > > spent
> > > > > handling irqs, and could conceivably mis-count all of the
> > > > > ticks
> > > > > during
> > > > > a period as irq time, when there was some irq and some
> > > > > softirq
> > > > > time.
> > > > > 
> > > > > This can actually happen when irqtime_account_process_tick is
> > > > > called
> > > > > from account_idle_ticks, which can pass a larger number of
> > > > > ticks
> > > > > down
> > > > > all at once.
> > > > > 
> > > > > Fix this by changing irqtime_account_hi_update,
> > > > > irqtime_account_si_update,
> > > > > and steal_account_process_ticks to work with cputime_t time
> > > > > units,
> > > > > and
> > > > > return the amount of time spent in each mode.
> > > > 
> > > > Do we need to minus st cputime from idle cputime in
> > > > account_idle_ticks() when noirqtime is true? I try to add this
> > > > logic
> > > > w/ noirqtime and idle=poll boot parameter for a full dynticks
> > > > guest,
> > > > however, there is no difference, where I miss?
> > > 
> > > Yes, you are right. The code in account_idle_ticks()
> > > could use the same treatment.
> > > 
> > > I am not sure why it would not work, though...
> > 
> > Actually I observed a regression caused by this patch. I use a i5
> 
> The regression is caused by your commit "sched,time: Count actually
> elapsed irq & softirq time".

Wanpeng and I discussed this issue, and discovered
that this bug is triggered by my patch, specifically
this bit:

-       if (steal_account_process_tick(ULONG_MAX))
+       other = account_other_time(cputime);
+       if (other >= cputime)
                return;

Replacing "cputime" with "ULONG_MAX" as the argument
to account_other_time makes the bug disappear.

However, this is not the cause of the bug.

The cause of the bug appears to be that the values
used to figure out how much steal time has passed
are never initialized.

                steal = paravirt_steal_clock(smp_processor_id());
                steal -= this_rq()->prev_steal_time;

The first of the two may be initialized by the host
(I did not verify that), but the second one does not
have any explicit initializers anywhere in the kernel
tree.

This can lead to an arbitrarily large difference between
paravirt_steal_clock(smp_processor_id()) and
this_rq()->prev_steal_time, which results in nothing but
steal time getting accounted for a potentially a very
long amount of time.

Previously we carried this patch to initialize the
various rq->prev_* values at CPU hotplug time:

https://patchwork.codeaurora.org/patch/27699/

Which got reverted by Paolo here:

https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=sche
d/core&id=3d89e5478bf550a50c99e93adf659369798263b0

Which leads me to this question:

Paulo, how would you like us to fix this bug?

It seems like the host and guest steal time CAN get out
of sync, sometimes by a ridiculous amount, and we need
some way to get the excessive difference out of the way,
without it getting accounted as steal time (not immediately,
and not over the next 17 hours, or months).

-- 

All Rights Reversed.
Download attachment "signature.asc" of type "application/pgp-signature" (474 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ