[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240820094555.7gdb5ado35syu5me@linux.ibm.com>
Date: Tue, 20 Aug 2024 15:15:55 +0530
From: Srikar Dronamraju <srikar@...ux.ibm.com>
To: Suleiman Souhlal <suleiman@...gle.com>
Cc: Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
Paolo Bonzini <pbonzini@...hat.com>, joelaf@...gle.com,
vineethrp@...gle.com, linux-kernel@...r.kernel.org,
kvm@...r.kernel.org, ssouhlal@...ebsd.org
Subject: Re: [PATCH] sched: Don't try to catch up excess steal time.
* Suleiman Souhlal <suleiman@...gle.com> [2024-08-06 20:11:57]:
> When steal time exceeds the measured delta when updating clock_task, we
> currently try to catch up the excess in future updates.
> However, this results in inaccurate run times for the future clock_task
> measurements, as they end up getting additional steal time that did not
> actually happen, from the previous excess steal time being paid back.
>
> For example, suppose a task in a VM runs for 10ms and had 15ms of steal
> time reported while it ran. clock_task rightly doesn't advance. Then, a
> different task runs on the same rq for 10ms without any time stolen.
> Because of the current catch up mechanism, clock_sched inaccurately ends
> up advancing by only 5ms instead of 10ms even though there wasn't any
> actual time stolen. The second task is getting charged for less time
> than it ran, even though it didn't deserve it.
> In other words, tasks can end up getting more run time than they should
> actually get.
>
> So, we instead don't make future updates pay back past excess stolen time.
>
> Signed-off-by: Suleiman Souhlal <suleiman@...gle.com>
> ---
> kernel/sched/core.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index bcf2c4cc0522..42b37da2bda6 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -728,13 +728,15 @@ static void update_rq_clock_task(struct rq *rq, s64 delta)
> #endif
> #ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
> if (static_key_false((¶virt_steal_rq_enabled))) {
> - steal = paravirt_steal_clock(cpu_of(rq));
> + u64 prev_steal;
> +
> + steal = prev_steal = paravirt_steal_clock(cpu_of(rq));
> steal -= rq->prev_steal_time_rq;
>
> if (unlikely(steal > delta))
> steal = delta;
>
> - rq->prev_steal_time_rq += steal;
> + rq->prev_steal_time_rq = prev_steal;
> delta -= steal;
> }
> #endif
Agree with the change.
Probably, we could have achieved by just moving a line above
Something like this?
#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
if (static_key_false((¶virt_steal_rq_enabled))) {
steal = paravirt_steal_clock(cpu_of(rq));
steal -= rq->prev_steal_time_rq;
rq->prev_steal_time_rq += steal;
if (unlikely(steal > delta))
steal = delta;
delta -= steal;
}
#endif
--
Thanks and Regards
Srikar Dronamraju
Powered by blists - more mailing lists