lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sun, 16 May 2021 15:58:53 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     yanfei.xu@...driver.com
Cc:     josh@...htriplett.org, rostedt@...dmis.org,
        mathieu.desnoyers@...icios.com, jiangshanlai@...il.com,
        joel@...lfernandes.org, rcu@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] rcu: fix a deadlock caused by not release
 rcu_node->lock

On Sun, May 16, 2021 at 05:50:10PM +0800, yanfei.xu@...driver.com wrote:
> From: Yanfei Xu <yanfei.xu@...driver.com>
> 
> rcu_node->lock isn't released in rcu_print_task_stall() if the rcu_node
> don't contain tasks which blocking the GP. However this rcu_node->lock
> will be used again in rcu_dump_cpu_stacks() soon while the ndetected is
> non-zero. As a result the cpu will hung by this deadlock.
> 
> Fixes: c583bcb8f5ed ("rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled")
> Signed-off-by: Yanfei Xu <yanfei.xu@...driver.com>

Also a good catch, thank you!  Queued for further review and testing,
wordsmithed as shown below.  The rcutorture scripts have been known to
work on ARM in the past, and might still do so.  (I test on x86.)

As always, please check to make sure that I didn't mess something up.

							Thanx, Paul

------------------------------------------------------------------------

commit e0a9b77f245ae4fe1537120fd5319bf9e091618e
Author: Yanfei Xu <yanfei.xu@...driver.com>
Date:   Sun May 16 17:50:10 2021 +0800

    rcu: Fix stall-warning deadlock due to non-release of rcu_node ->lock
    
    If rcu_print_task_stall() is invoked on an rcu_node structure that does
    not contain any tasks blocking the current grace period, it takes an
    early exit that fails to release that rcu_node structure's lock.  This
    results in a self-deadlock, which is detected by lockdep.
    
    To reproduce this bug:
    
    tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 3 --trust-make --configs "TREE03" --kconfig "CONFIG_PROVE_LOCKING=y" --bootargs "rcutorture.stall_cpu=30 rcutorture.stall_cpu_block=1 rcutorture.fwd_progress=0 rcutorture.test_boost=0"
    
    This will also result in other complaints, including RCU's scheduler
    hook complaining about blocking rather than preemption and an rcutorture
    writer stall.
    
    Only a partial RCU CPU stall warning message will be printed because of
    the self-deadlock.
    
    This commit therefore releases the lock on the rcu_print_task_stall()
    function's early exit path.
    
    Fixes: c583bcb8f5ed ("rcu: Don't invoke try_invoke_on_locked_down_task() with irqs disabled")
    Signed-off-by: Yanfei Xu <yanfei.xu@...driver.com>
    Signed-off-by: Paul E. McKenney <paulmck@...nel.org>

diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index a10ea1f1f81f..d574e3bbd929 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -267,8 +267,10 @@ static int rcu_print_task_stall(struct rcu_node *rnp, unsigned long flags)
 	struct task_struct *ts[8];
 
 	lockdep_assert_irqs_disabled();
-	if (!rcu_preempt_blocked_readers_cgp(rnp))
+	if (!rcu_preempt_blocked_readers_cgp(rnp)) {
+		raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
 		return 0;
+	}
 	pr_err("\tTasks blocked on level-%d rcu_node (CPUs %d-%d):",
 	       rnp->level, rnp->grplo, rnp->grphi);
 	t = list_entry(rnp->gp_tasks->prev,

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ