linux-kernel - Re: [PATCH 2/2] sched/deadline: Correctly account for allocated bandwidth during hotplug

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8e55c640-c931-4b9c-a501-c5b0a654a420@redhat.com>
Date: Wed, 13 Nov 2024 11:06:24 -0500
From: Waiman Long <llong@...hat.com>
To: Juri Lelli <juri.lelli@...hat.com>, Tejun Heo <tj@...nel.org>,
 Johannes Weiner <hannes@...xchg.org>, Michal Koutny <mkoutny@...e.com>,
 Ingo Molnar <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>,
 Vincent Guittot <vincent.guittot@...aro.org>,
 Dietmar Eggemann <dietmar.eggemann@....com>,
 Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
 Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>
Cc: Qais Yousef <qyousef@...alina.io>,
 Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
 "Joel Fernandes (Google)" <joel@...lfernandes.org>,
 Suleiman Souhlal <suleiman@...gle.com>, Aashish Sharma <shraash@...gle.com>,
 Shin Kawamura <kawasin@...gle.com>,
 Vineeth Remanan Pillai <vineeth@...byteword.org>,
 linux-kernel@...r.kernel.org, cgroups@...r.kernel.org
Subject: Re: [PATCH 2/2] sched/deadline: Correctly account for allocated
 bandwidth during hotplug

On 11/13/24 7:57 AM, Juri Lelli wrote:
> For hotplug operations, DEADLINE needs to check that there is still enough
> bandwidth left after removing the CPU that is going offline. We however
> fail to do so currently.
>
> Restore the correct behavior by restructuring dl_bw_manage() a bit, so
> that overflow conditions (not enough bandwidth left) are properly
> checked. Also account for dl_server bandwidth, i.e. discount such
> bandwidht in the calculation since NORMAL tasks will be anyway moved
> away from the CPU as a result of the hotplug operation.
>
> Signed-off-by: Juri Lelli <juri.lelli@...hat.com>
> ---
>   kernel/sched/core.c     |  2 +-
>   kernel/sched/deadline.c | 33 ++++++++++++++++++++++++---------
>   kernel/sched/sched.h    |  2 +-
>   3 files changed, 26 insertions(+), 11 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 43e453ab7e20..d1049e784510 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -8057,7 +8057,7 @@ static void cpuset_cpu_active(void)
>   static int cpuset_cpu_inactive(unsigned int cpu)
>   {
>   	if (!cpuhp_tasks_frozen) {
> -		int ret = dl_bw_check_overflow(cpu);
> +		int ret = dl_bw_deactivate(cpu);
>   
>   		if (ret)
>   			return ret;
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index e53208a50279..609685c5df05 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -3467,29 +3467,31 @@ int dl_cpuset_cpumask_can_shrink(const struct cpumask *cur,
>   }
>   
>   enum dl_bw_request {
> -	dl_bw_req_check_overflow = 0,
> +	dl_bw_req_deactivate = 0,
>   	dl_bw_req_alloc,
>   	dl_bw_req_free
>   };
>   
>   static int dl_bw_manage(enum dl_bw_request req, int cpu, u64 dl_bw)
>   {
> -	unsigned long flags;
> +	unsigned long flags, cap;
>   	struct dl_bw *dl_b;
>   	bool overflow = 0;
> +	u64 fair_server_bw = 0;
>   
>   	rcu_read_lock_sched();
>   	dl_b = dl_bw_of(cpu);
>   	raw_spin_lock_irqsave(&dl_b->lock, flags);
>   
> -	if (req == dl_bw_req_free) {
> +	cap = dl_bw_capacity(cpu);
> +	switch (req) {
> +	case dl_bw_req_free:
>   		__dl_sub(dl_b, dl_bw, dl_bw_cpus(cpu));
> -	} else {
> -		unsigned long cap = dl_bw_capacity(cpu);
> -
> +		break;
> +	case dl_bw_req_alloc:
>   		overflow = __dl_overflow(dl_b, cap, 0, dl_bw);
>   
> -		if (req == dl_bw_req_alloc && !overflow) {
> +		if (!overflow) {
>   			/*
>   			 * We reserve space in the destination
>   			 * root_domain, as we can't fail after this point.
> @@ -3498,6 +3500,19 @@ static int dl_bw_manage(enum dl_bw_request req, int cpu, u64 dl_bw)
>   			 */
>   			__dl_add(dl_b, dl_bw, dl_bw_cpus(cpu));
>   		}
> +		break;
> +	case dl_bw_req_deactivate:
> +		/*
> +		 * cpu is going offline and NORMAL tasks will be moved away
> +		 * from it. We can thus discount dl_server bandwidth
> +		 * contribution as it won't need to be servicing tasks after
> +		 * the cpu is off.
> +		 */
> +		if (cpu_rq(cpu)->fair_server.dl_server)
> +			fair_server_bw = cpu_rq(cpu)->fair_server.dl_bw;
> +
> +		overflow = __dl_overflow(dl_b, cap, fair_server_bw, 0);
> +		break;

This part can still cause a failure in one of test cases in my cpuset 
partition test script. In this particular case, the CPU to be offlined 
is an isolated CPU with scheduling disabled. As a result, total_bw is 0 
and the __dl_overflow() test failed. Is there a way to skip the 
__dl_overflow() test for isolated CPUs? Can we use a null total_bw as a 
proxy for that?

Thanks,
Longman