linux-kernel - Re: [PATCH v23 6/9] sched: Handle blocked-waiter migration (and return migration)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANDhNCrfi7ZPv6vWOqHK+vkNYRBC0LzJdu6X3DsLO0r2TVLL9A@mail.gmail.com>
Date: Wed, 19 Nov 2025 23:27:30 -0800
From: John Stultz <jstultz@...gle.com>
To: K Prateek Nayak <kprateek.nayak@....com>
Cc: LKML <linux-kernel@...r.kernel.org>, Joel Fernandes <joelagnelf@...dia.com>, 
	Qais Yousef <qyousef@...alina.io>, Ingo Molnar <mingo@...hat.com>, 
	Peter Zijlstra <peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>, 
	Vincent Guittot <vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>, 
	Valentin Schneider <vschneid@...hat.com>, Steven Rostedt <rostedt@...dmis.org>, 
	Ben Segall <bsegall@...gle.com>, Zimuzo Ezeozue <zezeozue@...gle.com>, Mel Gorman <mgorman@...e.de>, 
	Will Deacon <will@...nel.org>, Waiman Long <longman@...hat.com>, Boqun Feng <boqun.feng@...il.com>, 
	"Paul E. McKenney" <paulmck@...nel.org>, Metin Kaya <Metin.Kaya@....com>, 
	Xuewen Yan <xuewen.yan94@...il.com>, Thomas Gleixner <tglx@...utronix.de>, 
	Daniel Lezcano <daniel.lezcano@...aro.org>, Suleiman Souhlal <suleiman@...gle.com>, 
	kuyo chang <kuyo.chang@...iatek.com>, hupu <hupu.gm@...il.com>, kernel-team@...roid.com
Subject: Re: [PATCH v23 6/9] sched: Handle blocked-waiter migration (and
 return migration)

On Wed, Nov 19, 2025 at 11:16 PM K Prateek Nayak <kprateek.nayak@....com> wrote:
> On 11/20/2025 12:03 PM, John Stultz wrote:
> > On Wed, Nov 19, 2025 at 6:55 PM K Prateek Nayak <kprateek.nayak@....com> wrote:
> >> On 11/20/2025 7:30 AM, John Stultz wrote:
> >>>> Ok, so you're suggesting maybe putting the
> >>>>     if (task_on_rq_migrating(owner))
> >>>> case ahead of the
> >>>>     if (owner_cpu != this_cpu)
> >>>> check?
> >>>>
> >>>> Let me give that a whirl and see how it does.
> >>>
> >>> That said, thinking another second on it, I also realize once we
> >>> decide to proxy_migrate, there is always the chance the owner gets
> >>> migrated somewhere else. So we can check task_on_rq_migrating() but
> >>> then right after we check that it might be migrated, and we can't
> >>> really prevent this.  And in that case, doing the proxy-migration to
> >>> the wrong place will be ok, as that cpu will then bounce the tasks to
> >>> the owner's new cpu.
> >>>
> >>> Hopefully this would be rare though. :)
> >>
> >> Ack! I was just thinking of some extreme scenarios. We can probably
> >> think about it if and when we run into a problem with it :)
> >>
> >> That said, once we decide to move the first donors to owner's CPU
> >> should we task some care to retain the owner on the same CPU as much
> >> as possible - take it out of the purview of load balancing and only
> >> move it if the owner is no long runnable on that CPU as a result of
> >> affinity changes?
> >
> > Eh, I'm hesitant to muck with the balancing effects on the lock
> > owners.  If it's better for them to move around, then the donor chain
> > should follow along (which will happen naturally).
>
> So assume the case where you have the owner and a bunch of blocked
> donor on the same rq. This rq appears the busiest to the load balancer.
>
> Load balancer go thorough the task list and find that almost
> everything is blocked on the owner. Then it arrives at the owner in a
> preempted state (queued; not running) and thinks this is a good enough
> task to move to reduce imbalance.
>
> Now, this triggers a whole chain migration at pick for all the blocked
> donors to the new CPU. Seems wasteful (although again this is a very
> unlikely scenario to not be on_cpu with so many donors on the CPU)

Yeah. This is a case we will probably need some tuning for. I'd lean
more towards trying not to consider the blocked_on tasks for
balancing, instead of trying to lock the owner down.

As always, I appreciate the thoughts and feedback!
-john