linux-kernel - Re: [PATCH 4/6 v8] sched/fair: Add push task mechanism for fair

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aTLdncXYqNyF9Bqq@vingu-cube>
Date: Fri, 5 Dec 2025 14:26:53 +0100
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: mingo@...hat.com, juri.lelli@...hat.com, dietmar.eggemann@....com,
	rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
	vschneid@...hat.com, linux-kernel@...r.kernel.org,
	pierre.gondois@....com, kprateek.nayak@....com, qyousef@...alina.io,
	hongyan.xia2@....com, christian.loehle@....com,
	luis.machado@....com
Subject: Re: [PATCH 4/6 v8] sched/fair: Add push task mechanism for fair

Le vendredi 05 déc. 2025 à 09:59:12 (+0100), Peter Zijlstra a écrit :
> On Thu, Dec 04, 2025 at 03:34:15PM +0100, Vincent Guittot wrote:
> > On Thu, 4 Dec 2025 at 12:29, Peter Zijlstra <peterz@...radead.org> wrote:
> > >
> > > On Tue, Dec 02, 2025 at 07:12:40PM +0100, Vincent Guittot wrote:
> > > > +/*
> > > > + * See if the non running fair tasks on this rq can be sent on other CPUs
> > > > + * that fits better with their profile.
> > > > + */
> > > > +static bool push_fair_task(struct rq *rq)
> > > > +{
> > > > +     struct task_struct *next_task;
> > > > +     int prev_cpu, new_cpu;
> > > > +     struct rq *new_rq;
> > > > +
> > > > +     next_task = pick_next_pushable_fair_task(rq);
> > > > +     if (!next_task)
> > > > +             return false;
> > > > +
> > > > +     if (is_migration_disabled(next_task))
> > > > +             return true;
> > > > +
> > > > +     /* We might release rq lock */
> > > > +     get_task_struct(next_task);
> > > > +
> > > > +     prev_cpu = rq->cpu;
> > > > +
> > > > +     new_cpu = select_task_rq_fair(next_task, prev_cpu, 0);
> > > > +
> > > > +     if (new_cpu == prev_cpu)
> > > > +             goto out;
> > > > +
> > > > +     new_rq = cpu_rq(new_cpu);
> > > > +
> > > > +     if (double_lock_balance(rq, new_rq)) {
> > > > +             /* The task has already migrated in between */
> > > > +             if (task_cpu(next_task) != rq->cpu) {
> > > > +                     double_unlock_balance(rq, new_rq);
> > > > +                     goto out;
> > > > +             }
> > > > +
> > > > +             deactivate_task(rq, next_task, 0);
> > > > +             set_task_cpu(next_task, new_cpu);
> > > > +             activate_task(new_rq, next_task, 0);
> > > > +
> > > > +             resched_curr(new_rq);
> > > > +
> > > > +             double_unlock_balance(rq, new_rq);
> > > > +     }
> > >
> > > Why not use move_queued_task() ?
> > 
> > double_lock_balance() can fail and prevent being blocked waiting for
> > new rq whereas move_queued_task() will wait, won't it ?
> > 
> > Do you think move_queued_task() would be better ?
> 
> No, double_lock_balance() never fails, the return value indicates if the
> currently held rq-lock, (the first argument) was unlocked while
> attaining both -- this is required when the first rq is a higher address
> than the second.
> 
> double_lock_balance() also puts the wait-time and hold time of the
> second inside the hold time of the first, which gets you a quadric term
> in the rq hold times IIRC. Something that's best avoided.

yeah, I misread the return and my current code need to be fixed like:

---
 kernel/sched/fair.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index fbbe325dc633..35c7c968ddd2 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8629,19 +8629,18 @@ static bool push_fair_task(struct rq *rq)
 
 	if (double_lock_balance(rq, new_rq)) {
 		/* The task has already migrated in between */
-		if (task_cpu(next_task) != rq->cpu) {
-			double_unlock_balance(rq, new_rq);
-			goto out;
-		}
+		if (task_cpu(next_task) != rq->cpu)
+			goto unlock;
+	}
 
-		deactivate_task(rq, next_task, 0);
-		set_task_cpu(next_task, new_cpu);
-		activate_task(new_rq, next_task, 0);
+	deactivate_task(rq, next_task, DEQUEUE_NOCLOCK);
+	set_task_cpu(next_task, new_cpu);
+	activate_task(new_rq, next_task, 0);
 
-		resched_curr(new_rq);
+	wakeup_preempt(new_rq, next_task, 0);
 
-		double_unlock_balance(rq, new_rq);
-	}
+unlock:
+	double_unlock_balance(rq, new_rq);
 
 out:
 	put_task_struct(next_task);
-- 
2.43.0



> 
> move_queued_task() OTOH takes the task off the runqueue you already hold
> locked, drops this lock, acquires the second, puts the task there, and
> returns with the dst rq locked.

I supposed it's doable even if we don't have rq_flags
But we need the re-lock the current rq and release the new one to let the balance_callback
loop in the same state

> 
> > In case of migrate_disable, push_fair_task() returns true and we
> > continue with the next task (It should not have much anyway). If the
> > task is migrate_disabled when we try to push it, we remove it from the
> > list anyway. At now, we try to not have more than 1 task in the list
> > to cap the overhead on sched_switch
> 
> Right, clearly I needed more wake-up juice, I thought it returned false
> and would stick around.