linux-kernel - Re: [PATCH v23 6/9] sched: Handle blocked-waiter migration (and return migration)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANDhNCpL+3dwSDV_yDfcpW66i2jGwjDF42RW+kKg7XmXLESwzw@mail.gmail.com>
Date: Fri, 7 Nov 2025 09:24:08 -0800
From: John Stultz <jstultz@...gle.com>
To: Juri Lelli <juri.lelli@...hat.com>
Cc: LKML <linux-kernel@...r.kernel.org>, Joel Fernandes <joelagnelf@...dia.com>, 
	Qais Yousef <qyousef@...alina.io>, Ingo Molnar <mingo@...hat.com>, 
	Peter Zijlstra <peterz@...radead.org>, Vincent Guittot <vincent.guittot@...aro.org>, 
	Dietmar Eggemann <dietmar.eggemann@....com>, Valentin Schneider <vschneid@...hat.com>, 
	Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, 
	Zimuzo Ezeozue <zezeozue@...gle.com>, Mel Gorman <mgorman@...e.de>, Will Deacon <will@...nel.org>, 
	Waiman Long <longman@...hat.com>, Boqun Feng <boqun.feng@...il.com>, 
	"Paul E. McKenney" <paulmck@...nel.org>, Metin Kaya <Metin.Kaya@....com>, 
	Xuewen Yan <xuewen.yan94@...il.com>, K Prateek Nayak <kprateek.nayak@....com>, 
	Thomas Gleixner <tglx@...utronix.de>, Daniel Lezcano <daniel.lezcano@...aro.org>, 
	Suleiman Souhlal <suleiman@...gle.com>, kuyo chang <kuyo.chang@...iatek.com>, hupu <hupu.gm@...il.com>, 
	kernel-team@...roid.com
Subject: Re: [PATCH v23 6/9] sched: Handle blocked-waiter migration (and
 return migration)

On Fri, Nov 7, 2025 at 7:19 AM Juri Lelli <juri.lelli@...hat.com> wrote:
> On 30/10/25 00:18, John Stultz wrote:
> > +#ifdef CONFIG_SCHED_PROXY_EXEC
> > +static inline void proxy_set_task_cpu(struct task_struct *p, int cpu)
> > +{
> > +     unsigned int wake_cpu;
> > +
> > +     /*
> > +      * Since we are enqueuing a blocked task on a cpu it may
> > +      * not be able to run on, preserve wake_cpu when we
> > +      * __set_task_cpu so we can return the task to where it
> > +      * was previously runnable.
> > +      */
> > +     wake_cpu = p->wake_cpu;
> > +     __set_task_cpu(p, cpu);
> > +     p->wake_cpu = wake_cpu;
> > +}
> > +#endif /* CONFIG_SCHED_PROXY_EXEC */
> > +
>
> ...
>
> > +static void proxy_migrate_task(struct rq *rq, struct rq_flags *rf,
> > +                            struct task_struct *p, int target_cpu)
> >  {
> > -     if (!__proxy_deactivate(rq, donor)) {
> > -             /*
> > -              * XXX: For now, if deactivation failed, set donor
> > -              * as unblocked, as we aren't doing proxy-migrations
> > -              * yet (more logic will be needed then).
> > -              */
> > -             clear_task_blocked_on(donor, NULL);
> > +     struct rq *target_rq = cpu_rq(target_cpu);
> > +
> > +     lockdep_assert_rq_held(rq);
> > +
> > +     /*
> > +      * Since we're going to drop @rq, we have to put(@rq->donor) first,
> > +      * otherwise we have a reference that no longer belongs to us.
> > +      *
> > +      * Additionally, as we put_prev_task(prev) earlier, its possible that
> > +      * prev will migrate away as soon as we drop the rq lock, however we
> > +      * still have it marked as rq->curr, as we've not yet switched tasks.
> > +      *
> > +      * So call proxy_resched_idle() to let go of the references before
> > +      * we release the lock.
> > +      */
> > +     proxy_resched_idle(rq);
> > +
> > +     WARN_ON(p == rq->curr);
> > +
> > +     deactivate_task(rq, p, 0);
> > +     proxy_set_task_cpu(p, target_cpu);
>
> We use proxy_set_task_cpu() here. BTW, can you comment/expand on why an
> ad-hoc set_task_cpu() is needed for proxy?

Since with proxy, we keep blocked waiters on the runqueue,
proxy-migrations may move those lock waiters to cpu runqueues where
they can't run. Ideally when the mutex is released, we want the waiter
to wake up on the same cpu it would have woken on if it has been
blocked.  So for proxy-migrations, we want to preserve the wake_cpu
value when we change the task_cpu.

For instance, if we were proxy-migrated as a donor to a cpu outside of
our affinity set, we don't want the wake_cpu hint to suggest that we
should be woken on a cpu we can't run on.

> > +static void proxy_force_return(struct rq *rq, struct rq_flags *rf,
> > +                            struct task_struct *p)
> > +{
> > +     struct rq *this_rq, *target_rq;
> > +     struct rq_flags this_rf;
> > +     int cpu, wake_flag = 0;
> > +
> > +     lockdep_assert_rq_held(rq);
> > +     WARN_ON(p == rq->curr);
> > +
> > +     get_task_struct(p);
> > +
> > +     /*
> > +      * We have to zap callbacks before unlocking the rq
> > +      * as another CPU may jump in and call sched_balance_rq
> > +      * which can trip the warning in rq_pin_lock() if we
> > +      * leave callbacks set.
> > +      */
> > +     zap_balance_callbacks(rq);
> > +     rq_unpin_lock(rq, rf);
> > +     raw_spin_rq_unlock(rq);
> > +
> > +     /*
> > +      * We drop the rq lock, and re-grab task_rq_lock to get
> > +      * the pi_lock (needed for select_task_rq) as well.
> > +      */
> > +     this_rq = task_rq_lock(p, &this_rf);
> > +     update_rq_clock(this_rq);
> > +
> > +     /*
> > +      * Since we let go of the rq lock, the task may have been
> > +      * woken or migrated to another rq before we  got the
> > +      * task_rq_lock. So re-check we're on the same RQ. If
> > +      * not, the task has already been migrated and that CPU
> > +      * will handle any futher migrations.
> > +      */
> > +     if (this_rq != rq)
> > +             goto err_out;
> > +
> > +     /* Similarly, if we've been dequeued, someone else will wake us */
> > +     if (!task_on_rq_queued(p))
> > +             goto err_out;
> > +
> > +     /*
> > +      * Since we should only be calling here from __schedule()
> > +      * -> find_proxy_task(), no one else should have
> > +      * assigned current out from under us. But check and warn
> > +      * if we see this, then bail.
> > +      */
> > +     if (task_current(this_rq, p) || task_on_cpu(this_rq, p)) {
> > +             WARN_ONCE(1, "%s rq: %i current/on_cpu task %s %d  on_cpu: %i\n",
> > +                       __func__, cpu_of(this_rq),
> > +                       p->comm, p->pid, p->on_cpu);
> > +             goto err_out;
> >       }
> > -     return NULL;
> > +
> > +     proxy_resched_idle(this_rq);
> > +     deactivate_task(this_rq, p, 0);
> > +     cpu = select_task_rq(p, p->wake_cpu, &wake_flag);
> > +     set_task_cpu(p, cpu);
>
> But, then use the 'standard' set_task_cpu() for the return migration. Is
> that intended?

Yep. On return migration, we want the task to be returned to the
runqueue it would have woken on (which wake_cpu provides a hint, but
select_task_rq can always choose somewhere else more appropriate).
Once the cpu has been selected, it's then fine if the wake_cpu is then
set to cpu the task now being assigned to.

thanks
-john