linux-kernel - Re: [PATCH] sched_ext: Fix cpu_released while RT task and SCX task are scheduled concurrently

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aFmwHzO2AKFXO_YS@slm.duckdns.org>
Date: Mon, 23 Jun 2025 09:50:55 -1000
From: 'Tejun Heo' <tj@...nel.org>
To: liuwenfang <liuwenfang@...or.com>
Cc: 'David Vernet' <void@...ifault.com>, 'Andrea Righi' <arighi@...dia.com>,
	'Changwoo Min' <changwoo@...lia.com>,
	'Ingo Molnar' <mingo@...hat.com>,
	'Peter Zijlstra' <peterz@...radead.org>,
	'Juri Lelli' <juri.lelli@...hat.com>,
	'Vincent Guittot' <vincent.guittot@...aro.org>,
	'Dietmar Eggemann' <dietmar.eggemann@....com>,
	'Steven Rostedt' <rostedt@...dmis.org>,
	'Ben Segall' <bsegall@...gle.com>, 'Mel Gorman' <mgorman@...e.de>,
	'Valentin Schneider' <vschneid@...hat.com>,
	"'linux-kernel@...r.kernel.org'" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] sched_ext: Fix cpu_released while RT task and SCX task
 are scheduled concurrently

Hello,

On Sat, Jun 21, 2025 at 04:09:55AM +0000, liuwenfang wrote:
> Supposed RT task(rt1) is running on one CPU with its rq->scx.cpu_released
> set to true, if the rt1 becomes sleeping, then the scheduler will balance
> the remote SCX task(scx1) because there is no other RT task on its rq,
> and rq->scx.cpu_released is false. While one RT task(rt2) is placed on
> this rq(maybe rt2 wakeup or migration occurs) before the scx1 is enqueued,
> then the scheduler will pick rt2. At last, rt2 will be running on this cpu
> with rq->scx.cpu_released being false!
> The main reason is that consume_remote_task() will unlock rq lock.

This is rather difficult to follow. Can you please break this down to a
table? People often use a format like the following:

         CPU X                           CPU Y
  A does something
                                  B does something else
  ...
                                  ...
  Boom

> @@ -2470,6 +2471,11 @@ static inline void put_prev_set_next_task(struct rq *rq,
>  
>  	prev->sched_class->put_prev_task(rq, prev, next);
>  	next->sched_class->set_next_task(rq, next, true);
> +
> +#ifdef CONFIG_SCHED_CLASS_EXT
> +	if (scx_enabled())
> +		switch_class(rq, next);
> +#endif

You're right that there is a race condition around this and I can't see a
way to solve this in SCX proper as there's no way for balance() to tell
whether a higher priority sched class has queued something while balance()
dropped the rq lock for migration, so adding a hook to
put_prev_set_next_task() seems like a reasoanble solution. However, can you
please do the followings?

- Improve the description so that the race condition is clearly
  understandable and explain why the extra hook in put_prev_set_next_task()
  is necessary.

- Rename switch_class() to something which fits the new location better -
  maybe scx_put_prev_set_next_task().

- If the function is called from put_prev_set_next_task(), it doesn't need
  to be called from put_prev_task_scx(). Drop that call.

Thanks.

-- 
tejun