[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250819100838.GH3245006@noisy.programming.kicks-ass.net>
Date: Tue, 19 Aug 2025 12:08:38 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: liuwenfang <liuwenfang@...or.com>
Cc: 'Tejun Heo' <tj@...nel.org>, 'David Vernet' <void@...ifault.com>,
'Andrea Righi' <arighi@...dia.com>,
'Changwoo Min' <changwoo@...lia.com>,
'Ingo Molnar' <mingo@...hat.com>,
'Juri Lelli' <juri.lelli@...hat.com>,
'Vincent Guittot' <vincent.guittot@...aro.org>,
'Dietmar Eggemann' <dietmar.eggemann@....com>,
'Steven Rostedt' <rostedt@...dmis.org>,
'Ben Segall' <bsegall@...gle.com>, 'Mel Gorman' <mgorman@...e.de>,
'Valentin Schneider' <vschneid@...hat.com>,
"'linux-kernel@...r.kernel.org'" <linux-kernel@...r.kernel.org>,
joelagnelf@...dia.com
Subject: Re: 回复: [PATCH v4 2/3] sched_ext: Fix cpu_released while RT task
and SCX task are scheduled concurrently
On Tue, Aug 19, 2025 at 08:47:38AM +0000, liuwenfang wrote:
> > Oh gawd, this is terrible.
> >
> > Why would you start the pick in balance and then cry when you fail the pick in
> > pick ?!?
> >
> > This is also the reason you need that weird CLASS_EXT exception in
> > prev_balance(), isn't it?
> Yeah, you are right, because there is task migration in our exception process.
> >
> > You're now asking for a 3rd call out to do something like:
> >
> > ->balance() -- ready a task for pick
> We must clarify that the target SCX task is currently located in the global queue, and it's CPU selection maybe CPU2,
> when the current CPU0 will be idle, this SCX task should be migrated to CPU0.
> > ->pick() -- picks a random other task
> The rq lock of CPU0 will be released during task migration, and higher priority task will be placed on CPU0 rq,
> So the CPU0 will not always pick the target SCX task timely.
> > ->put_prev() -- oops, our task didn't get picked, stick it back
> The higher priority task may cost a long time on CPU0, so we need to get the SCX task back for its low latency demand.
> >
> > Which is bloody ludicrous. So no. We're not doing this.
> >
> > Why can't pick DTRT ?
> This's why the CPU0 cannot pick one SCX task directly which task_cpu() is not CPU0.
Can you please have a look at these patches from Joel:
https://lkml.kernel.org/r/20250809184800.129831-1-joelagnelf@nvidia.com
I think it is dealing with a similar problem, and possibly making your
issue worse. At the same time, it might just be sufficient to solve
both.
It would be really good if we can get rid of this prev_balance() hack and
instead make sched_class::pick_task() able to deal with the whole thing.
Notably, I'm thinking that by passing &rf to pick_task() it becomes
possible to do something like (and I'm sketchy here because I'm never
quite sure how ext actually works):
pick_task_scx(rq, rf)
{
again:
p = pick_local_task();
if (!p) {
/*
* comment that explains why we can drop rq-lock here
*/
unlock(rq, rf);
... get task from global thing ...
lock(rq, rf);
if (rq->nr_running != rq->ext.nr_running)
return RETRY_PICK;
goto again;
}
return p;
}
And Joel's code seem very close to that, but doesn't seem to deal with
your issue.
Anyway, the above has this: 'rq->nr_running != rq->ext.nr_running'
thing, which is supposed to check if there is a higher class runnable
task, in which case it must 'abort' and re-try the pick. We should do
this every time we drop rq->lock.
Now in the case of a dl-server, we'll obviously have the dl-server be
runnable (in fact, its pick is what gets you here in the first place).
But its presence will then make you re-try the pick.
It might just work, it might need a little help -- please carefully
consider that case.
Powered by blists - more mailing lists