lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b09848dc-dea8-46e7-9f24-c11c64fd5d74@arm.com>
Date:   Wed, 1 Nov 2023 17:06:30 +0100
From:   Dietmar Eggemann <dietmar.eggemann@....com>
To:     Hongyan Xia <Hongyan.Xia2@....com>, Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Juri Lelli <juri.lelli@...hat.com>
Cc:     Qais Yousef <qyousef@...alina.io>,
        Morten Rasmussen <morten.rasmussen@....com>,
        Lukasz Luba <lukasz.luba@....com>,
        Christian Loehle <christian.loehle@....com>,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 2/6] sched/uclamp: Simulate PELT decay in
 util_avg_uclamp

On 04/10/2023 11:04, Hongyan Xia wrote:
> From: Hongyan Xia <hongyan.xia2@....com>
> 
> Because util_avg_uclamp is not directly managed by PELT, it lacks the
> nice property of slowly decaying to a lower value, resulting in
> performance degredation due to premature frequency drops.
> 
> Add functions to decay root cfs utilization and tasks that are not on
> the rq. This way, we get the benefits of PELT while still maintaining
> uclamp. The rules are simple:
> 
> 1. When task is se->on_rq, enforce its util_avg_uclamp within uclamp
>    range.
> 2. When task is !se->on_rq, PELT decay its util_avg_uclamp.
> 3. When the root CFS util drops, PELT decay to the target frequency
>    instead of immediately dropping to a lower target frequency.
> 
> TODO: Can we somehow integrate this uclamp sum aggregation directly into
> util_avg, so that we don't need to introduce a new util_avg_uclamp
> signal and don't need to simulate PELT decay?

That's a good question. I'm wondering why you were not able to integrate
the maintenance of the util_avg_uclamp values inside the existing PELT
update functionality in fair.c ((__update_load_avg_xxx(),
propagate_entity_load_avg() -> update_tg_cfs_util() etc.)

Why do you need extra functions like ___decay_util_avg_uclamp_towards()
and ___update_util_avg_uclamp() for this?

> Signed-off-by: Hongyan Xia <hongyan.xia2@....com>
> ---
>  kernel/sched/fair.c  |  20 +++++++++
>  kernel/sched/pelt.c  | 103 ++++++++++++++++++++++++++++++++++++++++---
>  kernel/sched/sched.h |   2 +
>  3 files changed, 119 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 33e5a6e751c0..420af57d01ee 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4311,17 +4311,22 @@ static inline int
>  update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
>  {
>  	unsigned long removed_load = 0, removed_util = 0, removed_runnable = 0;
> +	unsigned int removed_root_util = 0;

 unsigned long removed_load = 0, removed_util = 0, removed_runnable = 0;
-unsigned int removed_root_util = 0;
+unsigned int __maybe_unused removed_root_util = 0;

Otherwise you get `warning: unused variable ‘rq’` w/ !CONFIG_UCLAMP_TASK

>  	struct sched_avg *sa = &cfs_rq->avg;
>  	int decayed = 0;
>  
>  	if (cfs_rq->removed.nr) {
>  		unsigned long r;
> +		struct rq *rq = rq_of(cfs_rq);
>  		u32 divider = get_pelt_divider(&cfs_rq->avg);
>  
>  		raw_spin_lock(&cfs_rq->removed.lock);
>  		swap(cfs_rq->removed.util_avg, removed_util);
>  		swap(cfs_rq->removed.load_avg, removed_load);
>  		swap(cfs_rq->removed.runnable_avg, removed_runnable);
> +#ifdef CONFIG_UCLAMP_TASK
> +		swap(rq->root_cfs_util_uclamp_removed, removed_root_util);
> +#endif
>  		cfs_rq->removed.nr = 0;
>  		raw_spin_unlock(&cfs_rq->removed.lock);
>  

[...]


>  #ifdef CONFIG_UCLAMP_TASK
> +static void ___decay_util_avg_uclamp_towards(u64 now,
> +					     u64 last_update_time,
> +					     u32 period_contrib,
> +					     unsigned int *old,
> +					     unsigned int new_val)
> +{
> +	unsigned int old_val = READ_ONCE(*old);
> +	u64 delta, periods;
> +
> +	if (old_val <= new_val) {
> +		WRITE_ONCE(*old, new_val);
> +		return;
> +	}

Why is the function called `decay`? In case `new >= old` you set old =
new and bail out. So it's also more like an `update` function?

> +	if (!last_update_time)
> +		return;
> +	delta = now - last_update_time;
> +	if ((s64)delta < 0)
> +		return;
> +	delta >>= 10;
> +	if (!delta)
> +		return;
> +
> +	delta += period_contrib;
> +	periods = delta / 1024;
> +	if (periods) {
> +		u64 diff = old_val - new_val;
> +
> +		/*
> +		 * Let's assume 3 tasks, A, B and C. A is still on rq but B and
> +		 * C have just been dequeued. The cfs.avg.util_avg_uclamp has
> +		 * become A but root_cfs_util_uclamp just starts to decay and is
> +		 * now still A + B + C.
> +		 *
> +		 * After p periods with y being the decay factor, the new
> +		 * root_cfs_util_uclamp should become
> +		 *
> +		 * A + B * y^p + C * y^p == A + (A + B + C - A) * y^p
> +		 *     == cfs.avg.util_avg_uclamp +
> +		 *        (root_cfs_util_uclamp_at_the_start - cfs.avg.util_avg_uclamp) * y^p
> +		 *     == cfs.avg.util_avg_uclamp + diff * y^p
> +		 *
> +		 * So, instead of summing up each individual decayed values, we
> +		 * could just decay the diff and not bother with the summation
> +		 * at all. This is why we decay the diff here.
> +		 */
> +		diff = decay_load(diff, periods);
> +		WRITE_ONCE(*old, new_val + diff);
> +	}
> +}

Looks like ___decay_util_avg_uclamp_towards() is used for:

(1) tasks with !se->on_rq to decay before enqueue

(2) rq->root_cfs_util_uclamp to align with
    &rq_of(cfs_rq)->cfs->avg.util_avg_uclamp

All the cfs_rq's and the taskgroup se's seem to be updated only in
___update_util_avg_uclamp() (which also handles the propagation towards
the root taskgroup).

[...]

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ