[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e3aa7e66-27d0-034b-7bdf-f4fab1c2df25@arm.com>
Date: Mon, 3 Apr 2023 13:40:32 +0200
From: Dietmar Eggemann <dietmar.eggemann@....com>
To: David Dai <davidai@...gle.com>, Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Valentin Schneider <vschneid@...hat.com>
Cc: Saravana Kannan <saravanak@...gle.com>, kernel-team@...roid.com,
linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 1/6] sched/fair: Add util_guest for tasks
Hi David,
On 31/03/2023 00:43, David Dai wrote:
> For virtualization usecases, util_est and util_avg currently tracked
> on the host aren't sufficient to accurately represent the workload on
> vCPU threads, which results in poor frequency selection and performance.
> For example, when a large workload migrates from a busy vCPU thread to
> an idle vCPU thread, it incurs additional DVFS ramp-up latencies
> as util accumulates.
>
> Introduce a new "util_guest" member as an additional PELT signal that's
> independently updated by the guest. When used, it's max aggregated to
> provide a boost to both task_util and task_util_est.
>
> Updating task_util and task_util_est will ensure:
> -Better task placement decisions for vCPU threads on the host
> -Correctly updating util_est.ewma during dequeue
> -Additive util with other threads on the same runqueue for more
> accurate frequency responses
>
> Co-developed-by: Saravana Kannan <saravanak@...gle.com>
> Signed-off-by: Saravana Kannan <saravanak@...gle.com>
> Signed-off-by: David Dai <davidai@...gle.com>
> ---
[...]
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 6986ea31c984..998649554344 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4276,14 +4276,16 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf);
>
> static inline unsigned long task_util(struct task_struct *p)
> {
> - return READ_ONCE(p->se.avg.util_avg);
> + return max(READ_ONCE(p->se.avg.util_avg),
> + READ_ONCE(p->se.avg.util_guest));
> }
>
> static inline unsigned long _task_util_est(struct task_struct *p)
> {
> struct util_est ue = READ_ONCE(p->se.avg.util_est);
>
> - return max(ue.ewma, (ue.enqueued & ~UTIL_AVG_UNCHANGED));
> + return max_t(unsigned long, READ_ONCE(p->se.avg.util_guest),
> + max(ue.ewma, (ue.enqueued & ~UTIL_AVG_UNCHANGED)));
> }
I can't see why the existing p->uclamp_req[UCLAMP_MIN].value can't be
used here instead p->se.avg.util_guest.
I do understand the issue of inheriting uclamp values at fork but don't
get the not being `additive` thing. We are at task level here.
The fact that you have to max util_avg and util_est directly in
task_util() and _task_util_est() tells me that there are places where
this helps and uclamp_task_util() is not called there.
When you say in the cover letter that you tried uclamp_min, how exactly
did you use it? Did you run the existing mainline or did you use
uclamp_min as a replacement for util_guest in this patch here?
> static inline unsigned long task_util_est(struct task_struct *p)
> @@ -6242,6 +6244,15 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> */
> util_est_enqueue(&rq->cfs, p);
>
> + /*
> + * The normal code path for host thread enqueue doesn't take into
> + * account guest task migrations when updating cpufreq util.
> + * So, always update the cpufreq when a vCPU thread has a
> + * non-zero util_guest value.
> + */
> + if (READ_ONCE(p->se.avg.util_guest))
> + cpufreq_update_util(rq, 0);
This is because enqueue_entity() -> update_load_avg() ->
attach_entity_load_avg() -> cfs_rq_util_change() requires root run-queue
(&rq->cfs == cfs_rq) to call cpufreq_update_util()?
[...]
Powered by blists - more mailing lists