[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0144ab66963248cf8587c47bf900aabb@honor.com>
Date: Sun, 20 Jul 2025 09:20:22 +0000
From: liuwenfang <liuwenfang@...or.com>
To: 'Tejun Heo' <tj@...nel.org>
CC: 'David Vernet' <void@...ifault.com>, 'Andrea Righi' <arighi@...dia.com>,
'Changwoo Min' <changwoo@...lia.com>, 'Ingo Molnar' <mingo@...hat.com>,
'Peter Zijlstra' <peterz@...radead.org>, 'Juri Lelli'
<juri.lelli@...hat.com>, 'Vincent Guittot' <vincent.guittot@...aro.org>,
'Dietmar Eggemann' <dietmar.eggemann@....com>, 'Steven Rostedt'
<rostedt@...dmis.org>, 'Ben Segall' <bsegall@...gle.com>, 'Mel Gorman'
<mgorman@...e.de>, 'Valentin Schneider' <vschneid@...hat.com>,
"'linux-kernel@...r.kernel.org'" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH v2 1/2] sched_ext: Fix cpu_released while RT task and SCX
task are scheduled concurrently
Thanks for your feedback.
>
> Hello,
>
> My aplogies for really late reply. I've been off work and ended up a lot more
> offline than I expected.
>
> On Sat, Jun 28, 2025 at 06:50:32AM +0000, liuwenfang wrote:
> > Supposed RT task(RT1) is running on CPU0 and RT task(RT2) is awakened
> > on CPU1,
> > RT1 becomes sleep and SCX task(SCX1) will be dispatched to CPU0, RT2
> > will be placed on CPU0:
> >
> > CPU0(schedule)
> CPU1(try_to_wake_up)
> > set_current_state(TASK_INTERRUPTIBLE) try_to_wake_up #
> RT2
> > __schedule
> select_task_rq # CPU0 is selected
> > LOCK rq(0)->lock # lock CPU0 rq ttwu_queue
> > deactivate_task # RT1 LOCK
> rq(0)->lock # busy waiting
> > pick_next_task # no more RT tasks on rq |
> > prev_balance |
> > balance_scx |
> > balance_one |
> > rq->scx.cpu_released = false; |
> > consume_global_dsq |
> > consume_dispatch_q |
> > consume_remote_task |
> > UNLOCK rq(0)->lock V
> > LOCK
> rq(0)->lock # succ
> > deactivate_task # SCX1
> ttwu_do_activate
> > LOCK rq(0)->lock # busy waiting activate_task
> # RT2 equeued
> > |
> UNLOCK rq(0)->lock
> > V
> > LOCK rq(0)->lock # succ
> > activate_task # SCX1
> > pick_task # RT2 is picked
> > put_prev_set_next_task # prev is RT1, next is RT2,
> > rq->scx.cpu_released = false; UNLOCK rq(0)->lock
> >
> > At last, RT2 will be running on CPU0 with rq->scx.cpu_released being false!
> >
> > So, Add the scx_next_task_picked () and check sched class again to fix
> > the value of rq->scx.cpu_released.
>
> Yeah, the problem and diagnosis look correct to me. It'd be nice if we don't have
> to add an explicit hook but ops.cpu_acquire() needs to be called before
> dispatching to the CPU and then we can lose while doing ops.pick_task().
>
> > Signed-off-by: l00013971 <l00013971@...onor.com>
>
> Can you please use "FIRST_NAME LAST_NAME <EMAIL>" when signing off?
>
> > -static void switch_class(struct rq *rq, struct task_struct *next)
> > +static void switch_class(struct rq *rq, struct task_struct *next,
> > +bool prev_on_scx)
> > {
> > const struct sched_class *next_class = next->sched_class;
> >
> > @@ -3197,7 +3197,8 @@ static void switch_class(struct rq *rq, struct
> task_struct *next)
> > * kick_cpus_irq_workfn() who is waiting for this CPU to perform a
> > * resched.
> > */
> > - smp_store_release(&rq->scx.pnt_seq, rq->scx.pnt_seq + 1);
> > + if (prev_on_scx)
> > + smp_store_release(&rq->scx.pnt_seq, rq->scx.pnt_seq + 1);
>
> It's currently obviously broken as the seq is currently only incremented on scx
> -> !scx transitions but it should be called on all transitions. This is a breakage
> introduced by b999e365c298 ("sched, sched_ext: Replace
> scx_next_task_picked() with sched_class->switch_class()").
Thanks for the suggestion.
>
> > +void scx_next_task_picked(struct rq *rq, struct task_struct *prev,
> > + struct task_struct *next)
> > +{
> > + bool prev_on_scx = prev && (prev->sched_class == &ext_sched_class);
>
> I don't think @prev or @next can ever be NULL, can they?
@prev always has valid value in core scheduler routine.
>
> > +
> > + if (!scx_enabled() ||
>
> Let's make this an inline function in ext.h. The pnt_seq update should be moved
> here after scx_enabled() test, I think. This probably should be a separate patch.
Makes sense. Thanks for the suggestion.
>
> > + !next ||
> > + next->sched_class == &ext_sched_class)
> > + return;
> > +
> > + switch_class(rq, next, prev_on_scx); }
> >
> > static void put_prev_task_scx(struct rq *rq, struct task_struct *p,
> > struct task_struct *next)
> > {
> > @@ -3253,7 +3267,7 @@ static void put_prev_task_scx(struct rq *rq, struct
> task_struct *p,
> > */
> > if (p->scx.slice && !scx_rq_bypassing(rq)) {
> > dispatch_enqueue(&rq->scx.local_dsq, p, SCX_ENQ_HEAD);
> > - goto switch_class;
> > + return;
> ...
> > @@ -2465,6 +2468,8 @@ static inline void put_prev_set_next_task(struct
> > rq *rq,
> >
> > __put_prev_set_next_dl_server(rq, prev, next);
> >
> > + scx_next_task_picked(rq, prev, next);
>
> It's a bit unfortunate that we need to add this hook but I can't see another way
> around it for both the problem you're reporting and the pnt_seq issue.
> Maybe name it scx_put_prev_set_next(rq, prev, next) for consistency?
Makes sense. Thanks for the suggestion.
>
> Thanks.
>
> --
> Tejun
--
Regards.
wenfang
Powered by blists - more mailing lists