lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aKUWoePcNPcnJT1D@slm.duckdns.org>
Date: Tue, 19 Aug 2025 14:28:17 -1000
From: 'Tejun Heo' <tj@...nel.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: liuwenfang <liuwenfang@...or.com>, 'David Vernet' <void@...ifault.com>,
	'Andrea Righi' <arighi@...dia.com>,
	'Changwoo Min' <changwoo@...lia.com>,
	'Ingo Molnar' <mingo@...hat.com>,
	'Juri Lelli' <juri.lelli@...hat.com>,
	'Vincent Guittot' <vincent.guittot@...aro.org>,
	'Dietmar Eggemann' <dietmar.eggemann@....com>,
	'Steven Rostedt' <rostedt@...dmis.org>,
	'Ben Segall' <bsegall@...gle.com>, 'Mel Gorman' <mgorman@...e.de>,
	'Valentin Schneider' <vschneid@...hat.com>,
	"'linux-kernel@...r.kernel.org'" <linux-kernel@...r.kernel.org>,
	Joel Fernandes <joelagnelf@...dia.com>
Subject: Re: [PATCH v4 2/3] sched_ext: Fix cpu_released while RT task and SCX
 task are scheduled concurrently

Hello, Peter.

(cc'ing Joel for the @rf addition to pick_task())

On Tue, Aug 19, 2025 at 09:47:36AM +0200, Peter Zijlstra wrote:
...
> You're now asking for a 3rd call out to do something like:
> 
>   ->balance() -- ready a task for pick
>   ->pick() -- picks a random other task
>   ->put_prev() -- oops, our task didn't get picked, stick it back
> 
> Which is bloody ludicrous. So no. We're not doing this.
> 
> Why can't pick DTRT ?

This is unfortunate, but, given how things are set up right now, I think we
probably need the last one. Taking a step back and also considering the
proposed @rf addition to pick():

- The reason why SCX needs to do most of its dispatch operations in
  balance() is because the kernel side doesn't know which tasks are going to
  execute on which CPU until the task is actually picked for execution, so
  all picking must be preceded by balance() where tasks can be moved across
  rqs.

- There's a gap between balance() and pick_task() where a successful return
  from balance() doesn't guarantee that the corresponding pick() would be
  called. This seems intentional to guarantee that no matter what happens
  during balance(), pick_task() of the highest priority class with a pending
  task is guaranteed to get the CPU.

  This guarantee changes if we add @rf to pick_task() and let it unlock and
  relock. A higher priority task may get queued while the rq lock is
  released and then the lower priority pick_task() may still return a task
  of its own. This should be resolvable although it may not be completely
  trivial. We need to shift clear_tsk_need_resched() before pick_task()'s
  and then make wakeup_preempt() would probalby need more complications to
  guarantee that resched_curr() is not skipped while scheduling is taking
  place.

- SCX's ops.cpu_acquire() and .cpu_release() are to tell the BPF scheduler
  that a CPU is available for running SCX tasks or not. We want to tell the
  BPF side that a CPU became available before its ops.dispatch() is called -
  ie. before balance(). So, IIUC, this is where the problem is. Because
  there's a gap between balance() and pick_task(), the CPU might get taken
  by a higher priority sched class inbetween. If that happens, we need to
  tell the BPF scheduler that it lost the CPU. However, if the previous task
  wasn't a SCX one, there's currently no place to tell so.

  IOW, SCX needs to invoke ops.cpu_released() when a CPU is taken between
  its balance() and pick_task(); however, that can happen when both prev and
  next tasks are !SCX tasks, so it needs something which is always called.

If @rf is added to pick_task() so that we can merge balance() into
pick_task(), that'd be simplify these. SCX wouldn't need balance index
boosting and can handle cpu_acquire/release() within pick_task(). What do
you think?

Thanks.

-- 
tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ