linux-kernel - Re: [PATCH 2/3] rcu: Defer RCU kthreads wakeup when CPU is dying

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZYMNUdbFIWaK6T1d@localhost.localdomain>
Date: Wed, 20 Dec 2023 16:50:41 +0100
From: Frederic Weisbecker <frederic@...nel.org>
To: Joel Fernandes <joel@...lfernandes.org>
Cc: LKML <linux-kernel@...r.kernel.org>, Boqun Feng <boqun.feng@...il.com>,
	Neeraj Upadhyay <neeraj.upadhyay@....com>,
	Uladzislau Rezki <urezki@...il.com>,
	Zqiang <qiang.zhang1211@...il.com>, rcu <rcu@...r.kernel.org>,
	"Paul E . McKenney" <paulmck@...nel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH 2/3] rcu: Defer RCU kthreads wakeup when CPU is dying

Le Tue, Dec 19, 2023 at 10:01:55PM -0500, Joel Fernandes a écrit :
> > (Though right now I'm missing the flush_smp_call_function_queue() call that flushes
> > the ttwu queue between sched_cpu_deactivate() and sched_cpu_wait_empty())
> 
> Possible. I saw your IRC message to Peter on that as well, thanks for
> following up. I need to find some time to look more into that, but that does
> sound concerning.

Found it! It's smpcfd_dying_cpu().

> > But note this patch does something different, it doesn't defer the runqueue
> > enqueue like ttwu queue does. It defers the whole actual wakeup. This means that the
> > decision as to where to queue the task is delegated to an online CPU. So it's
> > not the same constraints. Waking up a task _from_ a CPU that is active or not but
> > at least online is supposed to be fine.
> 
> Agreed, thanks for the clarifications. But along similar lines (and at the
> risk of oversimplifying), is it not possible to send an IPI to an online CPU
> to queue the hrtimer locally there if you detect that the current CPU is
> going down? In the other thread to Hilf, you mentioned the hrtimer infra has
> to have equal or earlier deadline, but you can just queue the hrtimer from
> the IPI handler and that should take care of it?

This is something that Thomas wanted to avoid IIRC, because the IPI can make
it miss the deadline. But I guess in the case of an offline CPU, it can be a
last resort.

> Let me know if I missed something which should make for some good holiday
> reading material. ;-)

Let me summarize the possible fixes we can have:

1) It's RCU's fault! We must check and fix all the wake ups performed by RCU
   from rcutree_report_cpu_dead(). But beware other possible wake-ups/timer
   enqueue from the outgoing CPU after hrtimers are migrated.

2) It's scheduler's fault! do_start_rt_bandwidth() should check if the current
   CPU is offline and place manually the timer to an online CPU (through an
   IPI? yuck)

3) It's hrtimer's fault! If the current CPU is offline, it must arrange for
   queueing to an online CPU. Not easy to do as we must find one whose next
   expiry is below/equal the scheduler timer. As a last resort, this could be
   force queued to any and then signalled through an IPI, even though it's
   something we've tried to avoid until now.

   Also It's hard for me to think about another way to fix the deadlock fixed
   by 5c0930ccaad5a74d74e8b18b648c5eb21ed2fe94. Hrtimers migration can't happen
   after rcutree_report_cpu_dead(), because it may use RCU...

None of the above look pretty anyway. Thoughts?