linux-kernel - Re: [PATCH] sched: Fix tg->load when offlining a CPU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2769a37a-fc76-45b7-9ae7-d328778a25fc@oracle.com>
Date: Fri, 22 Dec 2023 04:04:23 +1100
From: Imran Khan <imran.f.khan@...cle.com>
To: Vincent Guittot <vincent.guittot@...aro.org>, mingo@...hat.com,
        peterz@...radead.org, juri.lelli@...hat.com, dietmar.eggemann@....com,
        rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
        bristot@...hat.com, vschneid@...hat.com, aaron.lu@...el.com,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sched: Fix tg->load when offlining a CPU

Thanks again for the fix Vincent.

On 22/12/2023 3:40 am, Vincent Guittot wrote:
> When a CPU taken offline, the contribution of its cfs_rqs to task_groups'
> load may remain and impact the calculation of the share of the online
> CPUs.
> Clear the contribution of an offlining CPU to task groups' load and skip
> its contribution while it is inactive.
> 
> Reported-by: Imran Khan <imran.f.khan@...cle.com>
> Signed-off-by: Vincent Guittot <vincent.guittot@...aro.org>

Reviewed-and-tested-by: Imran Khan <imran.f.khan@...cle.com>

> ---
>  kernel/sched/fair.c | 50 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 50 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 07200b4751b3..07edb7dce671 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4094,6 +4094,10 @@ static inline void update_tg_load_avg(struct cfs_rq *cfs_rq)
>  	if (cfs_rq->tg == &root_task_group)
>  		return;
>  
> +	/* rq has been offline and doesn't contribute anymore to the share */
> +	if (!cpu_active(cpu_of(rq_of(cfs_rq))))
> +		return;
> +
>  	/*
>  	 * For migration heavy workloads, access to tg->load_avg can be
>  	 * unbound. Limit the update rate to at most once per ms.
> @@ -4110,6 +4114,49 @@ static inline void update_tg_load_avg(struct cfs_rq *cfs_rq)
>  	}
>  }
>  
> +static inline void clear_tg_load_avg(struct cfs_rq *cfs_rq)
> +{
> +	long delta;
> +	u64 now;
> +
> +	/*
> +	 * No need to update load_avg for root_task_group as it is not used.
> +	 */
> +	if (cfs_rq->tg == &root_task_group)
> +		return;
> +
> +	now = sched_clock_cpu(cpu_of(rq_of(cfs_rq)));
> +	delta = 0 - cfs_rq->tg_load_avg_contrib;
> +	atomic_long_add(delta, &cfs_rq->tg->load_avg);
> +	cfs_rq->tg_load_avg_contrib = 0;
> +	cfs_rq->last_update_tg_load_avg = now;
> +}
> +
> +/* cpu offline callback */
> +static void __maybe_unused clear_tg_offline_cfs_rqs(struct rq *rq)
> +{
> +	struct task_group *tg;
> +
> +	lockdep_assert_rq_held(rq);
> +
> +	/*
> +	 * The rq clock has already been updated in the
> +	 * set_rq_offline(), so we should skip updating
> +	 * the rq clock again in unthrottle_cfs_rq().
> +	 */
> +	rq_clock_start_loop_update(rq);
> +
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(tg, &task_groups, list) {
> +		struct cfs_rq *cfs_rq = tg->cfs_rq[cpu_of(rq)];
> +
> +		clear_tg_load_avg(cfs_rq);
> +	}
> +	rcu_read_unlock();
> +
> +	rq_clock_stop_loop_update(rq);
> +}
> +
>  /*
>   * Called within set_task_rq() right before setting a task's CPU. The
>   * caller only guarantees p->pi_lock is held; no other assumptions,
> @@ -12423,6 +12470,9 @@ static void rq_offline_fair(struct rq *rq)
>  
>  	/* Ensure any throttled groups are reachable by pick_next_task */
>  	unthrottle_offline_cfs_rqs(rq);
> +
> +	/* Ensure that we remove rq contribution to group share */
> +	clear_tg_offline_cfs_rqs(rq);
>  }
>  
>  #endif /* CONFIG_SMP */