linux-kernel - Re: [PATCH 1/5] sched,time: Count actually elapsed irq & softirq time

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1470805661.13905.95.camel@redhat.com>
Date:	Wed, 10 Aug 2016 01:07:41 -0400
From:	Rik van Riel <riel@...hat.com>
To:	Wanpeng Li <kernellwp@...il.com>
Cc:	Frederic Weisbecker <fweisbec@...il.com>,
	Ingo Molnar <mingo@...nel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Paolo Bonzini <pbonzini@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Wanpeng Li <wanpeng.li@...mail.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Radim Krcmar <rkrcmar@...hat.com>,
	Mike Galbraith <efault@....de>
Subject: Re: [PATCH 1/5] sched,time: Count actually elapsed irq & softirq
 time

On Wed, 2016-08-10 at 07:39 +0800, Wanpeng Li wrote:
> 2016-08-10 7:25 GMT+08:00 Wanpeng Li <kernellwp@...il.com>:
> > 2016-08-09 22:06 GMT+08:00 Rik van Riel <riel@...hat.com>:
> > > On Tue, 2016-08-09 at 11:59 +0800, Wanpeng Li wrote:
> > > > Hi Rik,
> > > > 2016-07-13 22:50 GMT+08:00 Frederic Weisbecker <fweisbec@...il.
> > > > com>:
> > > > > From: Rik van Riel <riel@...hat.com>
> > > > > 
> > > > > Currently, if there was any irq or softirq time during
> > > > > 'ticks'
> > > > > jiffies, the entire period will be accounted as irq or
> > > > > softirq
> > > > > time.
> > > > > 
> > > > > This is inaccurate if only a subset of the time was actually
> > > > > spent
> > > > > handling irqs, and could conceivably mis-count all of the
> > > > > ticks
> > > > > during
> > > > > a period as irq time, when there was some irq and some
> > > > > softirq
> > > > > time.
> > > > > 
> > > > > This can actually happen when irqtime_account_process_tick is
> > > > > called
> > > > > from account_idle_ticks, which can pass a larger number of
> > > > > ticks
> > > > > down
> > > > > all at once.
> > > > > 
> > > > > Fix this by changing irqtime_account_hi_update,
> > > > > irqtime_account_si_update,
> > > > > and steal_account_process_ticks to work with cputime_t time
> > > > > units,
> > > > > and
> > > > > return the amount of time spent in each mode.
> > > > 
> > > > Do we need to minus st cputime from idle cputime in
> > > > account_idle_ticks() when noirqtime is true? I try to add this
> > > > logic
> > > > w/ noirqtime and idle=poll boot parameter for a full dynticks
> > > > guest,
> > > > however, there is no difference, where I miss?
> > > 
> > > Yes, you are right. The code in account_idle_ticks()
> > > could use the same treatment.
> > > 
> > > I am not sure why it would not work, though...
> > 
> > Actually I observed a regression caused by this patch. I use a i5
> 
> The regression is caused by your commit "sched,time: Count actually
> elapsed irq & softirq time".

Wanpeng and I discussed this issue, and discovered
that this bug is triggered by my patch, specifically
this bit:

-       if (steal_account_process_tick(ULONG_MAX))
+       other = account_other_time(cputime);
+       if (other >= cputime)
                return;

Replacing "cputime" with "ULONG_MAX" as the argument
to account_other_time makes the bug disappear.

However, this is not the cause of the bug.

The cause of the bug appears to be that the values
used to figure out how much steal time has passed
are never initialized.

                steal = paravirt_steal_clock(smp_processor_id());
                steal -= this_rq()->prev_steal_time;

The first of the two may be initialized by the host
(I did not verify that), but the second one does not
have any explicit initializers anywhere in the kernel
tree.

This can lead to an arbitrarily large difference between
paravirt_steal_clock(smp_processor_id()) and
this_rq()->prev_steal_time, which results in nothing but
steal time getting accounted for a potentially a very
long amount of time.

Previously we carried this patch to initialize the
various rq->prev_* values at CPU hotplug time:

https://patchwork.codeaurora.org/patch/27699/

Which got reverted by Paolo here:

https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=sche
d/core&id=3d89e5478bf550a50c99e93adf659369798263b0

Which leads me to this question:

Paulo, how would you like us to fix this bug?

It seems like the host and guest steal time CAN get out
of sync, sometimes by a ridiculous amount, and we need
some way to get the excessive difference out of the way,
without it getting accounted as steal time (not immediately,
and not over the next 17 hours, or months).

-- 

All Rights Reversed.
Download attachment "signature.asc" of type "application/pgp-signature" (474 bytes)