[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <56DBAF6D.5010503@mail.usask.ca>
Date: Sat, 05 Mar 2016 22:17:49 -0600
From: Chris Friesen <cbf123@...l.usask.ca>
To: Frederic Weisbecker <fweisbec@...il.com>,
Thomas Gleixner <tglx@...utronix.de>
CC: John Stultz <john.stultz@...aro.org>,
Daniel Lezcano <daniel.lezcano@...aro.org>,
lkml <linux-kernel@...r.kernel.org>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...nel.org>, Rik van Riel <riel@...hat.com>
Subject: Re: [PATCH] steal_account_process_tick() should return jiffies
On 03/05/2016 07:19 AM, Frederic Weisbecker wrote:
> On Sat, Mar 05, 2016 at 11:27:01AM +0100, Thomas Gleixner wrote:
>> Chris,
>>
>> On Fri, 4 Mar 2016, Chris Friesen wrote:
>>
>> First of all the subject line should contain a subsystem prefix,
>> i.e. "sched/cputime:"
>>
>>> The callers of steal_account_process_tick() expect it to return whether
>>> the last jiffy was stolen or not.
>>>
>>> Currently the return value of steal_account_process_tick() is in units
>>> of cputime, which vary between either jiffies or nsecs depending on
>>> CONFIG_VIRT_CPU_ACCOUNTING_GEN.
>>
>> Sure, but what is the actual problem? The return value is boolean and tells
>> whether there was stolen time accounted or not.
> Indeed the changelog should better explain the problem. So I think the issue is that
> if the cputime has nsecs granularity and we have a tiny stolen time to account (lets say
> a few nanosecs, in fact anything that is below a jiffy), we are not going to account the
> tick on user/system.
Yes, this is exactly it. Because of this, if CONFIG_VIRT_CPU_ACCOUNTING_GEN is
enabled in a guest then the idle/system/user stats in /proc/stat can show odd
values, and "top" shows nothing for user/system even if CPU hogs are running.
> But the fix doesn't look right to me because we are still accounting the steal time
> if it is lower than a jiffy and that steal time will never be substracted to user/system
> time if it never reach a jiffy.
>
> Instead the fix should accumulate the steal time and account it only once it's worth
> a jiffy and then substract it from system/user time accordingly.
Yes, on reflection you are correct, and the patch looks pretty close, except
that account_steal_time() is still expecting units of cputime. I'll send a
followup patch.
> Something like that:
>
> diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
> index b2ab2ff..d38e25f 100644
> --- a/kernel/sched/cputime.c
> +++ b/kernel/sched/cputime.c
> @@ -262,7 +262,7 @@ static __always_inline bool steal_account_process_tick(void)
> #ifdef CONFIG_PARAVIRT
> if (static_key_false(¶virt_steal_enabled)) {
> u64 steal;
> - cputime_t steal_ct;
> + unsigned long steal_jiffies;
>
> steal = paravirt_steal_clock(smp_processor_id());
> steal -= this_rq()->prev_steal_time;
> @@ -272,11 +272,11 @@ static __always_inline bool steal_account_process_tick(void)
> * based on jiffies). Lets cast the result to cputime
> * granularity and account the rest on the next rounds.
> */
> - steal_ct = nsecs_to_cputime(steal);
> - this_rq()->prev_steal_time += cputime_to_nsecs(steal_ct);
> + steal_jiffies = nsecs_to_jiffies(steal);
> + this_rq()->prev_steal_time += jiffies_to_nsecs(steal_jiffies);
>
> account_steal_time(steal_ct);
> - return steal_ct;
> + return steal_jiffies;
> }
> #endif
> return false;
>
Powered by blists - more mailing lists