lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 18 Jan 2022 12:06:46 -0800
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Mukesh Ojha <quic_mojha@...cinc.com>
Cc:     lkml <linux-kernel@...r.kernel.org>,
        Thomas Gleixner <tglx@...utronix.de>, tj@...nel.org,
        jiangshanlai@...il.com
Subject: Re: synchronize_rcu_expedited gets stuck in hotplug path

On Tue, Jan 18, 2022 at 05:16:39PM +0530, Mukesh Ojha wrote:
> Hi ,
> 
> We are facing one issue in hotplug test where cpuhp/2 gets stuck in below
> path [1] in
> synchronize_rcu_expedited at state CPUHP_AP_ONLINE_DYN and it is not able to
> proceed.
> We see wait_rcu_exp_gp() is queued to cpu2  and it looks like it did not get
> chance to
> run as we see it as in pending state at cpu2 [2].
> 
> So, when exactly cpu2 gets available for scheduling in hotplug path, is it
> after
> CPUHP_AP_ACTIVE?
> 
> It looks to be dead lock here. Can it be fixed by making wait_rcu_exp_gp()
> queued on another wq ?
> or is it a wrong usage of synchronise_rcu in hotplug path?
> 
> [1]
> 
> =======================================================
> Process: cpuhp/2, [affinity: 0x4] cpu: 2 pid: 24 start: 0xffffff87803e4a00
> =====================================================
>     Task name: cpuhp/2 [affinity: 0x4] pid: 24 cpu: 2 prio: 120 start:
> ffffff87803e4a00
>     state: 0x2[D] exit_state: 0x0 stack base: 0xffffffc010160000
>     Last_enqueued_ts:      59.022215498 Last_sleep_ts: 59.022922946
>     Stack:
>     [<ffffffe9f4074354>] __switch_to+0x248
>     [<ffffffe9f5c02474>] __schedule+0x5b0
>     [<ffffffe9f5c02b28>] schedule+0x80
>     [<ffffffe9f42321a4>] synchronize_rcu_expedited+0x1c4
>     [<ffffffe9f423b294>] synchronize_rcu+0x4c
>     [<ffffffe9f6d04ab0>] waltgov_stop[sched_walt]+0x78
>     [<ffffffe9f512fa28>] cpufreq_add_policy_cpu+0xc0
>     [<ffffffe9f512e48c>] cpufreq_online[jt]+0x10f4
>     [<ffffffe9f51323b8>] cpuhp_cpufreq_online+0x14
>     [<ffffffe9f4128d3c>] cpuhp_invoke_callback+0x2f8
>     [<ffffffe9f412c30c>] cpuhp_thread_fun+0x130
>     [<ffffffe9f4187a58>] smpboot_thread_fn+0x180
>     [<ffffffe9f417d98c>] kthread+0x150
>     [<ffffffe9f4013918>] ret_to_user[jt]+0x0
> 
> 
> [2]
> 
> CPU 2
> pool 0
> IDLE Workqueue worker: kworker/2:3 current_work: (None)
> IDLE Workqueue worker: kworker/2:2 current_work: (None)
> IDLE Workqueue worker: kworker/2:1 current_work: (None)
> IDLE Workqueue worker: kworker/2:0 current_work: (None)
> Pending entry: wait_rcu_exp_gp[jt]
> Pending entry: lru_add_drain_per_cpu[jt]
> Pending entry: wq_barrier_func[jt]

Interesting.  Adding Tejun and Lai on CC for their perspective.

As you say, the incoming CPU invoked synchronize_rcu_expedited() which
in turn invoked queue_work().  By default, workqueues will of course
queue that work on the current CPU.  But in this case, the CPU's bit
is not yet set in the cpu_active_mask.  Thus, a workqueue scheduled on
the incoming CPU won't be invoked until CPUHP_AP_ACTIVE, which won't
be reached until after the grace period ends, which cannot happen until
the workqueue handler is invoked.

I could imagine doing something as shown in the (untested) patch below,
but first does this help?

If it does help, would this sort of check be appropriate here or
should it instead go into workqueues?

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 60197ea24ceb9..03c0556b29f22 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -849,7 +849,15 @@ void synchronize_rcu_expedited(void)
 		/* Marshall arguments & schedule the expedited grace period. */
 		rew.rew_s = s;
 		INIT_WORK_ONSTACK(&rew.rew_work, wait_rcu_exp_gp);
-		queue_work(rcu_gp_wq, &rew.rew_work);
+		preempt_disable();
+		if (cpumask_test_cpu(smp_processor_id(), cpu_active_mask)) {
+			preempt_enable();
+			queue_work(rcu_gp_wq, &rew.rew_work);
+		} else {
+			// Avoid incoming CPUs.
+			preempt_enable();
+			queue_work_on(cpumask_first(cpu_active_mask), rcu_gp_wq, &rew.rew_work);
+		}
 	}
 
 	/* Wait for expedited grace period to complete. */

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ