lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220209233811.GC557593@lothringen>
Date:   Thu, 10 Feb 2022 00:38:11 +0100
From:   Frederic Weisbecker <frederic@...nel.org>
To:     paulmck@...nel.org
Cc:     kernel-team@...com, linux-kernel@...r.kernel.org,
        quic_mojha@...cinc.com, rcu@...r.kernel.org, rostedt@...dmis.org,
        tj@...nel.org, Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH rcu 3/3] rcu: Allow expedited RCU grace periods on
 incoming CPUs

On Fri, Feb 04, 2022 at 02:55:07PM -0800, Paul E. McKenney wrote:
> Although it is usually safe to invoke synchronize_rcu_expedited() from a
> preemption-enabled CPU-hotplug notifier, if it is invoked from a notifier
> between CPUHP_AP_RCUTREE_ONLINE and CPUHP_AP_ACTIVE, its attempts to
> invoke a workqueue handler will hang due to RCU waiting on a CPU that
> the scheduler is not paying attention to.  This commit therefore expands
> use of the existing workqueue-independent synchronize_rcu_expedited()
> from early boot to also include CPUs that are being hotplugged.
> 
> Link: https://lore.kernel.org/lkml/7359f994-8aaf-3cea-f5cf-c0d3929689d6@quicinc.com/
> Reported-by: Mukesh Ojha <quic_mojha@...cinc.com>
> Cc: Tejun Heo <tj@...nel.org>
> Signed-off-by: Paul E. McKenney <paulmck@...nel.org>

I'm surprised by this scheduler behaviour.

Since sched_cpu_activate() hasn't been called yet,
rq->balance_callback = balance_push_callback. As a result, balance_push() should
be called at the end of schedule() when the workqueue is picked as the next task.
Then eventually the workqueue should be immediately preempted by the stop task to
be migrated elsewhere.

So I must be missing something. For the fun, I booted the following and it
didn't produce any issue:

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 80faf2273ce9..b1e74a508881 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4234,6 +4234,8 @@ int rcutree_online_cpu(unsigned int cpu)
 
 	// Stop-machine done, so allow nohz_full to disable tick.
 	tick_dep_clear(TICK_DEP_BIT_RCU);
+	if (cpu != 0)
+		synchronize_rcu_expedited();
 	return 0;
 }
 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ