linux-kernel - Re: [RFC 1/1] sched: Remove unreliable wake_cpu check in proxy_needs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANDhNCptjorLthhgMBLLCCYsk5MJT73Daj+usFPWNenpgYTF4A@mail.gmail.com>
Date: Wed, 9 Apr 2025 16:35:22 -0700
From: John Stultz <jstultz@...gle.com>
To: hupu <hupu.gm@...il.com>
Cc: linux-kernel@...r.kernel.org, juri.lelli@...hat.com, peterz@...radead.org, 
	vschneid@...hat.com, mingo@...hat.com, vincent.guittot@...aro.org, 
	dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com, 
	mgorman@...e.de, hupu@...nssion.com
Subject: Re: [RFC 1/1] sched: Remove unreliable wake_cpu check in proxy_needs_return

Hey hupu!
  Thanks so much for taking the time to review the full proxy patch
series and to submit this change!

On Mon, Apr 7, 2025 at 6:47 AM hupu <hupu.gm@...il.com> wrote:
>
> The (p->wake_cpu != cpu_of(rq)) check in proxy_needs_return() is unsafe
> during early wakeup phase. When called via ttwu_runnable() path:
>
> |-- try_to_wake_up
>     |-- ttwu_runnable
>         |-- proxy_needs_return    //we are here
>     |-- select_task_rq
>     |-- set_task_cpu              //set p->wake_cpu here
>     |-- ttwu_queue
>
> The p->wake_cpu at this point reflects the CPU where donor last ran before
> blocking, not the target migration CPU. During blocking period:
> 1. CPU affinity may have been changed by other threads
> 2. Proxy migrations might have altered the effective wake_cpu
> 3. set_task_cpu() hasn't updated wake_cpu yet in this code path

A few nits here:
1) We do reassign wake_cpu __set_cpus_allowed_ptr_locked() if necessary.
2) Proxy migrations use proxy_set_task_cpu() which preserve the wake_cpu

And I'm not sure I quite understand the concern of #3 above. Could you
clarify further?

> This makes the wake_cpu vs current CPU comparison meaningless and potentially
> dangerous. Rely on find_proxy_task()'s later migration logic to handle CPU
> placement based on up-to-date affinity and scheduler state.

The task_cpu serialization rules are a bit more subtle then I'd like,
so I'll grant there likely could be bugs lurking there, but I'm not
totally sure I understand the case you're thinking of here.

As you noted, in proxy_needs_return() we use the wake_cpu to tell us
if the task is still on the rq where it was blocked,  with the hope
that we can short-cut the deactiave&wakeup logic and instead just mark
it BO_RUNNABLE and return from ttwu.

I'm not sure how the "find_proxy_task()'s later migration logic" gets
involved here. Instead, it seems your change would just force it so we
always do the deactivate&wakeup (still using the task->wake_cpu as the
guide for select_task_rq()).

This would be a more conservative approach, since select_task_rq would
handle finding an appropriate rq if wake_cpu was not correct, so I
don't disagree with your suggestion. In fact, in getting the v16
series ready, I've reworked the find_proxy_tasks()'s "need_return:"
logic to do the deactivate/wakeup step because otherwise I was hitting
cases where we would try to do the manual return-migration off to a
offline cpu during cpu-hotplug testing.  So a similar approach might
be helpful here to avoid wakeups on offlined cpus (though I need to
look a bit more at the cpu hotplug code to understand how we'd hit the
task->on_rq case and have task_cpu() return an offlined rq).

But I'm a little hesitant to drop the optimization, so if you have a
more specific case that would be problematic, I'd appreciate you
sharing.

thanks
-john