[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <72a3b3f4-1b74-6c03-9d04-ac4bb721a55a@windriver.com>
Date: Mon, 17 May 2021 09:55:30 +0800
From: "Xu, Yanfei" <yanfei.xu@...driver.com>
To: paulmck@...nel.org
Cc: josh@...htriplett.org, rostedt@...dmis.org,
mathieu.desnoyers@...icios.com, jiangshanlai@...il.com,
joel@...lfernandes.org, rcu@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] rcu: fix a deadlock caused by not release
rcu_node->lock
On 5/17/21 6:58 AM, Paul E. McKenney wrote:
> [Please note: This e-mail is from an EXTERNAL e-mail address]
>
> On Sun, May 16, 2021 at 05:50:10PM +0800, yanfei.xu@...driver.com wrote:
>> From: Yanfei Xu <yanfei.xu@...driver.com>
>>
>> rcu_node->lock isn't released in rcu_print_task_stall() if the rcu_node
>> don't contain tasks which blocking the GP. However this rcu_node->lock
>> will be used again in rcu_dump_cpu_stacks() soon while the ndetected is
>> non-zero. As a result the cpu will hung by this deadlock.
>>
>> Fixes: c583bcb8f5ed ("rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled")
>> Signed-off-by: Yanfei Xu <yanfei.xu@...driver.com>
>
> Also a good catch, thank you! Queued for further review and testing,
> wordsmithed as shown below. The rcutorture scripts have been known to
> work on ARM in the past, and might still do so. (I test on x86.)
>
> As always, please check to make sure that I didn't mess something up.
>
Looks good to me, Thanks!
Regards,
Yanfei
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> commit e0a9b77f245ae4fe1537120fd5319bf9e091618e
> Author: Yanfei Xu <yanfei.xu@...driver.com>
> Date: Sun May 16 17:50:10 2021 +0800
>
> rcu: Fix stall-warning deadlock due to non-release of rcu_node ->lock
>
> If rcu_print_task_stall() is invoked on an rcu_node structure that does
> not contain any tasks blocking the current grace period, it takes an
> early exit that fails to release that rcu_node structure's lock. This
> results in a self-deadlock, which is detected by lockdep.
>
> To reproduce this bug:
>
> tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 3 --trust-make --configs "TREE03" --kconfig "CONFIG_PROVE_LOCKING=y" --bootargs "rcutorture.stall_cpu=30 rcutorture.stall_cpu_block=1 rcutorture.fwd_progress=0 rcutorture.test_boost=0"
>
> This will also result in other complaints, including RCU's scheduler
> hook complaining about blocking rather than preemption and an rcutorture
> writer stall.
>
> Only a partial RCU CPU stall warning message will be printed because of
> the self-deadlock.
>
> This commit therefore releases the lock on the rcu_print_task_stall()
> function's early exit path.
>
> Fixes: c583bcb8f5ed ("rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled")
> Signed-off-by: Yanfei Xu <yanfei.xu@...driver.com>
> Signed-off-by: Paul E. McKenney <paulmck@...nel.org>
>
> diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> index a10ea1f1f81f..d574e3bbd929 100644
> --- a/kernel/rcu/tree_stall.h
> +++ b/kernel/rcu/tree_stall.h
> @@ -267,8 +267,10 @@ static int rcu_print_task_stall(struct rcu_node *rnp, unsigned long flags)
> struct task_struct *ts[8];
>
> lockdep_assert_irqs_disabled();
> - if (!rcu_preempt_blocked_readers_cgp(rnp))
> + if (!rcu_preempt_blocked_readers_cgp(rnp)) {
> + raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> return 0;
> + }
> pr_err("\tTasks blocked on level-%d rcu_node (CPUs %d-%d):",
> rnp->level, rnp->grplo, rnp->grphi);
> t = list_entry(rnp->gp_tasks->prev,
>
Powered by blists - more mailing lists