linux-kernel - Re: [RFC PATCH 4/5] sched/fair: Rework inter-NUMA newidle balancing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250410101402.GC30687@noisy.programming.kicks-ass.net>
Date: Thu, 10 Apr 2025 12:14:02 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: K Prateek Nayak <kprateek.nayak@....com>
Cc: Ingo Molnar <mingo@...hat.com>, Juri Lelli <juri.lelli@...hat.com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	linux-kernel@...r.kernel.org,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
	Valentin Schneider <vschneid@...hat.com>,
	"Gautham R. Shenoy" <gautham.shenoy@....com>,
	Swapnil Sapkal <swapnil.sapkal@....com>
Subject: Re: [RFC PATCH 4/5] sched/fair: Rework inter-NUMA newidle balancing

On Wed, Apr 09, 2025 at 11:15:38AM +0000, K Prateek Nayak wrote:
> +static inline int sched_newidle_pull_overloaded(struct sched_domain *sd,
> +						struct rq *this_rq,
> +						int *continue_balancing)
> +{
> +	struct cpumask *cpus = this_cpu_cpumask_var_ptr(load_balance_mask);
> +	int cpu, this_cpu = cpu_of(this_rq);
> +	struct sched_domain *sd_parent;
> +	struct lb_env env = {
> +		.dst_cpu	= this_cpu,
> +		.dst_rq		= this_rq,
> +		.idle		= CPU_NEWLY_IDLE,
> +	};
> +
> +
> +	cpumask_and(cpus, sched_domain_span(sd), cpu_active_mask);
> +
> +next_domain:
> +	env.sd = sd;
> +	/* Allow migrating cache_hot tasks too. */
> +	sd->nr_balance_failed = sd->cache_nice_tries + 1;
> +
> +	for_each_cpu_wrap(cpu, cpus, this_cpu) {
> +		struct sched_domain_shared *sd_share;
> +		struct cpumask *overloaded_mask;
> +		struct sched_domain *cpu_llc;
> +		int overloaded_cpu;
> +
> +		cpu_llc = rcu_dereference(per_cpu(sd_llc, cpu));
> +		if (!cpu_llc)
> +			break;
> +
> +		sd_share = cpu_llc->shared;
> +		if (!sd_share)
> +			break;
> +
> +		overloaded_mask = sd_share->overloaded_mask;
> +		if (!overloaded_mask)
> +			break;
> +
> +		for_each_cpu_wrap(overloaded_cpu, overloaded_mask, this_cpu + 1) {
> +			struct rq *overloaded_rq = cpu_rq(overloaded_cpu);
> +			struct task_struct *p = NULL;
> +
> +			if (sched_newidle_continue_balance(this_rq)) {
> +				*continue_balancing = 0;
> +				return 0;
> +			}
> +
> +			/* Quick peek to find if pushable tasks exist. */
> +			if (!has_pushable_tasks(overloaded_rq))
> +				continue;
> +
> +			scoped_guard (rq_lock, overloaded_rq) {
> +				update_rq_clock(overloaded_rq);
> +
> +				if (!has_pushable_tasks(overloaded_rq))
> +					break;

You can skip the clock update if there aren't any tasks to grab.

> +
> +				env.src_cpu = overloaded_cpu;
> +				env.src_rq = overloaded_rq;
> +
> +				p = detach_one_task(&env);

Yep, detach_one_task() uses can_migrate_task() which checks
task_on_cpu(), so that's all good :-)

> +			}
> +
> +			if (!p)
> +				continue;
> +
> +			attach_one_task(this_rq, p);
> +			return 1;
> +		}
> +
> +		cpumask_andnot(cpus, cpus, sched_domain_span(cpu_llc));
> +	}