lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250819074736.GD3245006@noisy.programming.kicks-ass.net>
Date: Tue, 19 Aug 2025 09:47:36 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: liuwenfang <liuwenfang@...or.com>
Cc: 'Tejun Heo' <tj@...nel.org>, 'David Vernet' <void@...ifault.com>,
	'Andrea Righi' <arighi@...dia.com>,
	'Changwoo Min' <changwoo@...lia.com>,
	'Ingo Molnar' <mingo@...hat.com>,
	'Juri Lelli' <juri.lelli@...hat.com>,
	'Vincent Guittot' <vincent.guittot@...aro.org>,
	'Dietmar Eggemann' <dietmar.eggemann@....com>,
	'Steven Rostedt' <rostedt@...dmis.org>,
	'Ben Segall' <bsegall@...gle.com>, 'Mel Gorman' <mgorman@...e.de>,
	'Valentin Schneider' <vschneid@...hat.com>,
	"'linux-kernel@...r.kernel.org'" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v4 2/3] sched_ext: Fix cpu_released while RT task and SCX
 task are scheduled concurrently


Could you please not thread your new patches onto the old thread? That
makes them near impossible to find.

On Tue, Aug 19, 2025 at 06:55:38AM +0000, liuwenfang wrote:
> Supposed RT task(RT1) is running on CPU0 and RT task(RT2) is awakened on CPU1,
> RT1 becomes sleep and SCX task(SCX1) will be dispatched to CPU0, RT2 will be
> placed on CPU0:
> 
> CPU0(schedule)                                     CPU1(try_to_wake_up)
> set_current_state(TASK_INTERRUPTIBLE)              try_to_wake_up # RT2
> __schedule                                           select_task_rq # CPU0 is selected
> LOCK rq(0)->lock # lock CPU0 rq                        ttwu_queue
>   deactivate_task # RT1                                  LOCK rq(0)->lock # busy waiting
>     pick_next_task # no more RT tasks on rq                 |
>       prev_balance                                          |
>         balance_scx                                         |
>           balance_one                                       |
>             rq->scx.cpu_released = false;                   |
>               consume_global_dsq                            |
>                 consume_dispatch_q                          |
>                   consume_remote_task                       |
>                     UNLOCK rq(0)->lock                      V
>                                                          LOCK rq(0)->lock # succ
>                     deactivate_task # SCX1               ttwu_do_activate
>                     LOCK rq(0)->lock # busy waiting      activate_task # RT2 equeued
>                        |                                 UNLOCK rq(0)->lock
>                        V
>                     LOCK rq(0)->lock # succ
>                     activate_task # SCX1
>       pick_task # RT2 is picked
>       put_prev_set_next_task # prev is RT1, next is RT2, rq->scx.cpu_released = false;
> UNLOCK rq(0)->lock
> 
> At last, RT2 will be running on CPU0 with rq->scx.cpu_released being false, which would
> lead the BPF scheduler to make decisions improperly.
> 
> So, check the sched class in __put_prev_set_next_scx() to fix the value of
> rq->scx.cpu_released.

Oh gawd, this is terrible.

Why would you start the pick in balance and then cry when you fail the
pick in pick ?!?

This is also the reason you need that weird CLASS_EXT exception in
prev_balance(), isn't it?

You're now asking for a 3rd call out to do something like:

  ->balance() -- ready a task for pick
  ->pick() -- picks a random other task
  ->put_prev() -- oops, our task didn't get picked, stick it back

Which is bloody ludicrous. So no. We're not doing this.

Why can't pick DTRT ?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ