linux-kernel - Re: [PATCH v9 2/7] sched/fair: Decay task PELT values during wakeup migration

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <72bd6945-c167-65ba-6f81-fad2768972dc@arm.com>
Date:   Tue, 31 May 2022 10:16:05 +0200
From:   Dietmar Eggemann <dietmar.eggemann@....com>
To:     Vincent Donnefort <vdonnefort@...gle.com>, peterz@...radead.org,
        mingo@...hat.com, vincent.guittot@...aro.org
Cc:     linux-kernel@...r.kernel.org, morten.rasmussen@....com,
        chris.redpath@....com, qperret@...gle.com, tao.zhou@...ux.dev,
        kernel-team@...roid.com
Subject: Re: [PATCH v9 2/7] sched/fair: Decay task PELT values during wakeup
 migration

- Vincent Donnefort <vincent.donnefort@....com>

On 23/05/2022 17:51, Vincent Donnefort wrote:
> From: Vincent Donnefort <vincent.donnefort@....com>

[...]

> [1] https://lore.kernel.org/all/20190709115759.10451-1-chris.redpath@arm.com/

minor:

I get `WARNING: Possible unwrapped commit description (prefer a maximum
75 chars per line)`. If you use

https://lkml.kernel.org/r/20190709115759.10451-1-chris.redpath@arm.com

this warning disappears.

[...]

> +static inline void migrate_se_pelt_lag(struct sched_entity *se)
> +{
> +	u64 throttled = 0, now, lut;
> +	struct cfs_rq *cfs_rq;
> +	struct rq *rq;
> +	bool is_idle;
> +
> +	if (load_avg_is_decayed(&se->avg))
> +		return;
> +
> +	cfs_rq = cfs_rq_of(se);
> +	rq = rq_of(cfs_rq);
> +
> +	rcu_read_lock();
> +	is_idle = is_idle_task(rcu_dereference(rq->curr));
> +	rcu_read_unlock();
> +
> +	/*
> +	 * The lag estimation comes with a cost we don't want to pay all the
> +	 * time. Hence, limiting to the case where the source CPU is idle and
> +	 * we know we are at the greatest risk to have an outdated clock.
> +	 */
> +	if (!is_idle)
> +		return;
> +
> +	/*
> +	 * Estimated "now" is: last_update_time + cfs_idle_lag + rq_idle_lag, where:
> +	 *
> +	 *   last_update_time (the cfs_rq's last_update_time)
> +	 *	= cfs_rq_clock_pelt()
> +	 *      = rq_clock_pelt() - cfs->throttled_clock_pelt_time

So this line is always:

		= rq_clock_pelt()@cfs_rq_idle -
		  cfs->throttled_clock_pelt_time@..._rq_idle

since we only execute this code when idle. Which then IMHO explains (1)
 better.

> +	 *
> +	 *   cfs_idle_lag (delta between cfs_rq's update and rq's update)
> +	 *      = rq_clock_pelt()@rq_idle - rq_clock_pelt()@cfs_rq_idle
> +	 *
> +	 *   rq_idle_lag (delta between rq's update and now)
> +	 *      = sched_clock_cpu() - rq_clock()@rq_idle
> +	 *
> +	 * The rq_clock_pelt() from last_update_time being the same as
> +	 * rq_clock_pelt()@cfs_rq_idle, we can write:

--> (1)    ^^^

> +	 *
> +	 *    now = rq_clock_pelt()@rq_idle - cfs->throttled_clock_pelt_time +
> +	 *          sched_clock_cpu() - rq_clock()@rq_idle
> +	 * Where:
> +	 *      rq_clock_pelt()@rq_idle        is rq->clock_pelt_idle
> +	 *      rq_clock()@rq_idle             is rq->enter_idle
> +	 *      cfs->throttled_clock_pelt_time is cfs_rq->throttled_pelt_idle

To understand this better:

		cfs->throttled_clock_pelt_time@..._rq_idle is
		cfs_rq->throttled_pelt_idle

[...]

> +	/*
> +	 * Paired with _update_idle_rq_clock_pelt. It ensures at the worst case

minor:

s/_update_idle_rq_clock_pelt/_update_idle_rq_clock_pelt()

> +	 * is observed the old clock_pelt_idle value and the new enter_idle,
> +	 * which lead to an understimation. The opposite would lead to an

s/understimation/underestimation

[...]

> @@ -8114,6 +8212,10 @@ static bool __update_blocked_fair(struct rq *rq, bool *done)
>  		if (update_cfs_rq_load_avg(cfs_rq_clock_pelt(cfs_rq), cfs_rq)) {
>  			update_tg_load_avg(cfs_rq);
>  
> +			/* sync clock_pelt_idle with last update */

update_idle_cfs_rq_clock_pelt() syncs cfs_rq->throttled_pelt_idle with
cfs_rq->throttled_clock_pelt_time. Not sure what `clock_pelt_idle` and
`last update` here mean?

[...]

> +/* The rq is idle, we can sync to clock_task */
> +static inline void _update_idle_rq_clock_pelt(struct rq *rq)
> +{
> +	rq->clock_pelt  = rq_clock_task(rq);
> +
> +	u64_u32_store(rq->enter_idle, rq_clock(rq));
> +	/* Paired with smp_rmb in migrate_se_pelt_lag */

minor:

s/migrate_se_pelt_lag/migrate_se_pelt_lag()

[...]

> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index bf4a0ec98678..97bc26e5c8af 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -648,6 +648,10 @@ struct cfs_rq {
>  	int			runtime_enabled;
>  	s64			runtime_remaining;
>  
> +	u64			throttled_pelt_idle;
> +#ifndef CONFIG_64BIT
> +	u64                     throttled_pelt_idle_copy;
> +#endif
>  	u64			throttled_clock;
>  	u64			throttled_clock_pelt;
>  	u64			throttled_clock_pelt_time;
> @@ -1020,6 +1024,12 @@ struct rq {
>  	u64			clock_task ____cacheline_aligned;
>  	u64			clock_pelt;
>  	unsigned long		lost_idle_time;
> +	u64			clock_pelt_idle;
> +	u64			enter_idle;
> +#ifndef CONFIG_64BIT
> +	u64			clock_pelt_idle_copy;
> +	u64			enter_idle_copy;
> +#endif
>  
>  	atomic_t		nr_iowait;

`throttled_pelt_idle`, `clock_pelt_idle` and `enter_idle` are clock
snapshots when cfs_rq resp. rq go idle. But the naming does not really
show this relation. And this makes reading those equations rather difficult.

What about something like `throttled_clock_pelt_time_enter_idle`,
`clock_pelt_enter_idle`, `clock_enter_idle`? Especially the first one is
too long but something which shows that those are clock snapshots when
enter idle would IMHO augment readability in migrate_se_pelt_lag().

Besides these small issues:

Reviewed-by: Dietmar Eggemann <dietmar.eggemann@....com>