linux-kernel - Re: [PATCH v23 6/9] sched: Handle blocked-waiter migration (and return migration)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b75d5eab-3211-43d4-8534-987707559710@amd.com>
Date: Mon, 10 Nov 2025 10:17:49 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: John Stultz <jstultz@...gle.com>
CC: LKML <linux-kernel@...r.kernel.org>, Joel Fernandes
	<joelagnelf@...dia.com>, Qais Yousef <qyousef@...alina.io>, Ingo Molnar
	<mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, Juri Lelli
	<juri.lelli@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>, Valentin Schneider
	<vschneid@...hat.com>, Steven Rostedt <rostedt@...dmis.org>, Ben Segall
	<bsegall@...gle.com>, Zimuzo Ezeozue <zezeozue@...gle.com>, Mel Gorman
	<mgorman@...e.de>, Will Deacon <will@...nel.org>, Waiman Long
	<longman@...hat.com>, Boqun Feng <boqun.feng@...il.com>, "Paul E. McKenney"
	<paulmck@...nel.org>, Metin Kaya <Metin.Kaya@....com>, Xuewen Yan
	<xuewen.yan94@...il.com>, Thomas Gleixner <tglx@...utronix.de>, "Daniel
 Lezcano" <daniel.lezcano@...aro.org>, Suleiman Souhlal <suleiman@...gle.com>,
	kuyo chang <kuyo.chang@...iatek.com>, hupu <hupu.gm@...il.com>,
	<kernel-team@...roid.com>
Subject: Re: [PATCH v23 6/9] sched: Handle blocked-waiter migration (and
 return migration)

Hello John,

On 11/8/2025 4:48 AM, John Stultz wrote:
>>> @@ -6689,26 +6834,41 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf)
>>>                       return NULL;
>>>               }
>>>
>>> +             if (task_current(rq, p))
>>> +                     curr_in_chain = true;
>>> +
>>>               owner = __mutex_owner(mutex);
>>>               if (!owner) {
>>>                       /*
>>> -                      * If there is no owner, clear blocked_on
>>> -                      * and return p so it can run and try to
>>> -                      * acquire the lock
>>> +                      * If there is no owner, either clear blocked_on
>>> +                      * and return p (if it is current and safe to
>>> +                      * just run on this rq), or return-migrate the task.
>>>                        */
>>> -                     __clear_task_blocked_on(p, mutex);
>>> -                     return p;
>>> +                     if (task_current(rq, p)) {
>>> +                             __clear_task_blocked_on(p, NULL);
>>> +                             return p;
>>> +                     }
>>> +                     action = NEEDS_RETURN;
>>> +                     break;
>>>               }
>>>
>>>               if (!READ_ONCE(owner->on_rq) || owner->se.sched_delayed) {
>>
>> Should we handle task_on_rq_migrating() in the similar way?
>> Wait for the owner to finish migrating and look at the
>> task_cpu(owner) once it is reliable?
> 
> Hrm. I'm not quite sure I understand your suggestion here. Could you
> expand a bit here? Are you thinking we should deactivate the donor
> when the owner is migrating? What would then return the donor to the
> runqueue? Just rescheduling idle so that we drop the rq lock
> momentarily should be sufficient to make sure the owner can finish
> migration.

In find_proxy_task() we have:

  if (!READ_ONCE(owner->on_rq) || owner->se.sched_delayed) {
    /* Returns rq->idle or NULL */
  }

  /*
   * Owner can be task_on_rq_migrating() at this point
   * since it is in turn blocked on a lock owner on a
   * different CPU.
   */

  owner_cpu = task_cpu(owner); /* Prev CPU */
  if (owner_cpu != this_cpu) {
    ...
    action = MIGRATE;
    break;
  }


So in the end we can migrate to the previous CPU of the owner
and the previous CPU has to do a chain migration again. I'm
probably overthinking about a very unlikely scenario here :)

Unfortunately, I don't really have a great way to detect it
unless we have another member in the task_struct that follows
task_cpu() for most part and is set to the "owner_cpu" as
soon as we know we are going for the "MIGRATE" action when we
are still under the "wait_lock"/"blocked_on_lock".

-- 
Thanks and Regards,
Prateek