[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aHltRzhQjwPsGovj@slm.duckdns.org>
Date: Thu, 17 Jul 2025 11:38:15 -1000
From: 'Tejun Heo' <tj@...nel.org>
To: liuwenfang <liuwenfang@...or.com>
Cc: 'David Vernet' <void@...ifault.com>, 'Andrea Righi' <arighi@...dia.com>,
'Changwoo Min' <changwoo@...lia.com>,
'Ingo Molnar' <mingo@...hat.com>,
'Peter Zijlstra' <peterz@...radead.org>,
'Juri Lelli' <juri.lelli@...hat.com>,
'Vincent Guittot' <vincent.guittot@...aro.org>,
'Dietmar Eggemann' <dietmar.eggemann@....com>,
'Steven Rostedt' <rostedt@...dmis.org>,
'Ben Segall' <bsegall@...gle.com>, 'Mel Gorman' <mgorman@...e.de>,
'Valentin Schneider' <vschneid@...hat.com>,
"'linux-kernel@...r.kernel.org'" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 1/2] sched_ext: Fix cpu_released while RT task and SCX
task are scheduled concurrently
Hello,
My aplogies for really late reply. I've been off work and ended up a lot
more offline than I expected.
On Sat, Jun 28, 2025 at 06:50:32AM +0000, liuwenfang wrote:
> Supposed RT task(RT1) is running on CPU0 and RT task(RT2) is awakened on CPU1,
> RT1 becomes sleep and SCX task(SCX1) will be dispatched to CPU0, RT2 will be
> placed on CPU0:
>
> CPU0(schedule) CPU1(try_to_wake_up)
> set_current_state(TASK_INTERRUPTIBLE) try_to_wake_up # RT2
> __schedule select_task_rq # CPU0 is selected
> LOCK rq(0)->lock # lock CPU0 rq ttwu_queue
> deactivate_task # RT1 LOCK rq(0)->lock # busy waiting
> pick_next_task # no more RT tasks on rq |
> prev_balance |
> balance_scx |
> balance_one |
> rq->scx.cpu_released = false; |
> consume_global_dsq |
> consume_dispatch_q |
> consume_remote_task |
> UNLOCK rq(0)->lock V
> LOCK rq(0)->lock # succ
> deactivate_task # SCX1 ttwu_do_activate
> LOCK rq(0)->lock # busy waiting activate_task # RT2 equeued
> | UNLOCK rq(0)->lock
> V
> LOCK rq(0)->lock # succ
> activate_task # SCX1
> pick_task # RT2 is picked
> put_prev_set_next_task # prev is RT1, next is RT2, rq->scx.cpu_released = false;
> UNLOCK rq(0)->lock
>
> At last, RT2 will be running on CPU0 with rq->scx.cpu_released being false!
>
> So, Add the scx_next_task_picked () and check sched class again to fix the value
> of rq->scx.cpu_released.
Yeah, the problem and diagnosis look correct to me. It'd be nice if we don't
have to add an explicit hook but ops.cpu_acquire() needs to be called before
dispatching to the CPU and then we can lose while doing ops.pick_task().
> Signed-off-by: l00013971 <l00013971@...onor.com>
Can you please use "FIRST_NAME LAST_NAME <EMAIL>" when signing off?
> -static void switch_class(struct rq *rq, struct task_struct *next)
> +static void switch_class(struct rq *rq, struct task_struct *next, bool prev_on_scx)
> {
> const struct sched_class *next_class = next->sched_class;
>
> @@ -3197,7 +3197,8 @@ static void switch_class(struct rq *rq, struct task_struct *next)
> * kick_cpus_irq_workfn() who is waiting for this CPU to perform a
> * resched.
> */
> - smp_store_release(&rq->scx.pnt_seq, rq->scx.pnt_seq + 1);
> + if (prev_on_scx)
> + smp_store_release(&rq->scx.pnt_seq, rq->scx.pnt_seq + 1);
It's currently obviously broken as the seq is currently only incremented on
scx -> !scx transitions but it should be called on all transitions. This is
a breakage introduced by b999e365c298 ("sched, sched_ext: Replace
scx_next_task_picked() with sched_class->switch_class()").
> +void scx_next_task_picked(struct rq *rq, struct task_struct *prev,
> + struct task_struct *next)
> +{
> + bool prev_on_scx = prev && (prev->sched_class == &ext_sched_class);
I don't think @prev or @next can ever be NULL, can they?
> +
> + if (!scx_enabled() ||
Let's make this an inline function in ext.h. The pnt_seq update should be
moved here after scx_enabled() test, I think. This probably should be a
separate patch.
> + !next ||
> + next->sched_class == &ext_sched_class)
> + return;
> +
> + switch_class(rq, next, prev_on_scx);
> +}
>
> static void put_prev_task_scx(struct rq *rq, struct task_struct *p,
> struct task_struct *next)
> {
> @@ -3253,7 +3267,7 @@ static void put_prev_task_scx(struct rq *rq, struct task_struct *p,
> */
> if (p->scx.slice && !scx_rq_bypassing(rq)) {
> dispatch_enqueue(&rq->scx.local_dsq, p, SCX_ENQ_HEAD);
> - goto switch_class;
> + return;
...
> @@ -2465,6 +2468,8 @@ static inline void put_prev_set_next_task(struct rq *rq,
>
> __put_prev_set_next_dl_server(rq, prev, next);
>
> + scx_next_task_picked(rq, prev, next);
It's a bit unfortunate that we need to add this hook but I can't see another
way around it for both the problem you're reporting and the pnt_seq issue.
Maybe name it scx_put_prev_set_next(rq, prev, next) for consistency?
Thanks.
--
tejun
Powered by blists - more mailing lists