linux-kernel - Re: [RFC PATCH 1/6] sched/fair: Add util

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <e3aa7e66-27d0-034b-7bdf-f4fab1c2df25@arm.com>
Date:   Mon, 3 Apr 2023 13:40:32 +0200
From:   Dietmar Eggemann <dietmar.eggemann@....com>
To:     David Dai <davidai@...gle.com>, Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Valentin Schneider <vschneid@...hat.com>
Cc:     Saravana Kannan <saravanak@...gle.com>, kernel-team@...roid.com,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 1/6] sched/fair: Add util_guest for tasks

Hi David,

On 31/03/2023 00:43, David Dai wrote:
> For virtualization usecases, util_est and util_avg currently tracked
> on the host aren't sufficient to accurately represent the workload on
> vCPU threads, which results in poor frequency selection and performance.
> For example, when a large workload migrates from a busy vCPU thread to
> an idle vCPU thread, it incurs additional DVFS ramp-up latencies
> as util accumulates.
> 
> Introduce a new "util_guest" member as an additional PELT signal that's
> independently updated by the guest. When used, it's max aggregated to
> provide a boost to both task_util and task_util_est.
> 
> Updating task_util and task_util_est will ensure:
> -Better task placement decisions for vCPU threads on the host
> -Correctly updating util_est.ewma during dequeue
> -Additive util with other threads on the same runqueue for more
> accurate frequency responses
> 
> Co-developed-by: Saravana Kannan <saravanak@...gle.com>
> Signed-off-by: Saravana Kannan <saravanak@...gle.com>
> Signed-off-by: David Dai <davidai@...gle.com>
> ---

[...]

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 6986ea31c984..998649554344 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4276,14 +4276,16 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf);
>  
>  static inline unsigned long task_util(struct task_struct *p)
>  {
> -	return READ_ONCE(p->se.avg.util_avg);
> +	return max(READ_ONCE(p->se.avg.util_avg),
> +			READ_ONCE(p->se.avg.util_guest));
>  }
>  
>  static inline unsigned long _task_util_est(struct task_struct *p)
>  {
>  	struct util_est ue = READ_ONCE(p->se.avg.util_est);
>  
> -	return max(ue.ewma, (ue.enqueued & ~UTIL_AVG_UNCHANGED));
> +	return max_t(unsigned long, READ_ONCE(p->se.avg.util_guest),
> +			max(ue.ewma, (ue.enqueued & ~UTIL_AVG_UNCHANGED)));
>  }

I can't see why the existing p->uclamp_req[UCLAMP_MIN].value can't be
used here instead p->se.avg.util_guest.

I do understand the issue of inheriting uclamp values at fork but don't
get the not being `additive` thing. We are at task level here.

The fact that you have to max util_avg and util_est directly in
task_util() and _task_util_est() tells me that there are places where
this helps and uclamp_task_util() is not called there.

When you say in the cover letter that you tried uclamp_min, how exactly
did you use it? Did you run the existing mainline or did you use
uclamp_min as a replacement for util_guest in this patch here?

>  static inline unsigned long task_util_est(struct task_struct *p)
> @@ -6242,6 +6244,15 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
>  	 */
>  	util_est_enqueue(&rq->cfs, p);
>  
> +	/*
> +	 * The normal code path for host thread enqueue doesn't take into
> +	 * account guest task migrations when updating cpufreq util.
> +	 * So, always update the cpufreq when a vCPU thread has a
> +	 * non-zero util_guest value.
> +	 */
> +	if (READ_ONCE(p->se.avg.util_guest))
> +		cpufreq_update_util(rq, 0);


This is because enqueue_entity() -> update_load_avg() ->
attach_entity_load_avg() -> cfs_rq_util_change() requires root run-queue
(&rq->cfs == cfs_rq) to call cpufreq_update_util()?

[...]