linux-kernel - Re: [PATCH v4] sched/fair: do not scan twice in detach

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <xhsmhv7nleqfl.mognet@vschneid-thinkpadt14sgen2i.remote.csb>
Date: Mon, 21 Jul 2025 13:25:34 +0200
From: Valentin Schneider <vschneid@...hat.com>
To: Vincent Guittot <vincent.guittot@...aro.org>, Huang Shijie
 <shijie@...amperecomputing.com>
Cc: mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
 patches@...erecomputing.com, cl@...ux.com,
 Shubhang@...amperecomputing.com, dietmar.eggemann@....com,
 rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4] sched/fair: do not scan twice in detach_tasks()

On 21/07/25 11:40, Vincent Guittot wrote:
> On Mon, 21 Jul 2025 at 04:40, Huang Shijie
> <shijie@...amperecomputing.com> wrote:
>>
>> detach_tasks() uses struct lb_env.loop_max as an env.src_rq->cfs_tasks
>> iteration count limit. It is however set without the source RQ lock held,
>> and besides detach_tasks() can be re-invoked after releasing and
>> re-acquiring the RQ lock per LBF_NEED_BREAK.
>>
>> This means that env.loop_max and the actual length of env.src_rq->cfs_tasks
>> as observed within detach_tasks() can differ. This can cause some tasks to
>
> why not setting env.loop_max only once rq lock is taken in this case ?
>
> side note : by default loop_max <= loop_break
>

I thought so too and dismissed that due to LBF_NEED_BREAK, but I guess we
could still do something like:

---
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b9b4bbbf0af6f..eef3a0d341661 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -11643,6 +11643,7 @@ static int sched_balance_rq(int this_cpu, struct rq *this_rq,
 		.dst_grpmask    = group_balance_mask(sd->groups),
 		.idle		= idle,
 		.loop_break	= SCHED_NR_MIGRATE_BREAK,
+		.loop_max       = UINT_MAX,
 		.cpus		= cpus,
 		.fbq_type	= all,
 		.tasks		= LIST_HEAD_INIT(env.tasks),
@@ -11681,18 +11682,19 @@ static int sched_balance_rq(int this_cpu, struct rq *this_rq,
 	/* Clear this flag as soon as we find a pullable task */
 	env.flags |= LBF_ALL_PINNED;
 	if (busiest->nr_running > 1) {
+more_balance:
 		/*
 		 * Attempt to move tasks. If sched_balance_find_src_group has found
 		 * an imbalance but busiest->nr_running <= 1, the group is
 		 * still unbalanced. ld_moved simply stays zero, so it is
 		 * correctly treated as an imbalance.
 		 */
-		env.loop_max  = min(sysctl_sched_nr_migrate, busiest->nr_running);
-
-more_balance:
 		rq_lock_irqsave(busiest, &rf);
 		update_rq_clock(busiest);
 
+
+		env.loop_max = min3(env.loop_max, sysctl_sched_nr_migrate, busiest->h_nr_running);
+
 		/*
 		 * cur_ld_moved - load moved in current iteration
 		 * ld_moved     - cumulative load moved across iterations