lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180626203225.GT2494@hirez.programming.kicks-ass.net>
Date:   Tue, 26 Jun 2018 22:32:25 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc:     linux-kernel@...r.kernel.org, mingo@...nel.org,
        jiangshanlai@...il.com, dipankar@...ibm.com,
        akpm@...ux-foundation.org, mathieu.desnoyers@...icios.com,
        josh@...htriplett.org, tglx@...utronix.de, rostedt@...dmis.org,
        dhowells@...hat.com, edumazet@...gle.com, fweisbec@...il.com,
        oleg@...hat.com, joel@...lfernandes.org
Subject: Re: [PATCH tip/core/rcu 13/22] rcu: Fix grace-period hangs due to
 race with CPU offline

On Tue, Jun 26, 2018 at 01:26:15PM -0700, Paul E. McKenney wrote:
> commit 2e5b2ff4047b138d6b56e4e3ba91bc47503cdebe
> Author: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
> Date:   Fri May 25 19:23:09 2018 -0700
> 
>     rcu: Fix grace-period hangs due to race with CPU offline
>     
>     Without special fail-safe quiescent-state-propagation checks, grace-period
>     hangs can result from the following scenario:
>     
>     1.      CPU 1 goes offline.
>     
>     2.      Because CPU 1 is the only CPU in the system blocking the current
>             grace period, the grace period ends as soon as
>             rcu_cleanup_dying_idle_cpu()'s call to rcu_report_qs_rnp()
>             returns.

My current code doesn't have that call... So this is a new problem
earlier in this series.

>     3.      At this point, the leaf rcu_node structure's ->lock is no longer
>             held: rcu_report_qs_rnp() has released it, as it must in order
>             to awaken the RCU grace-period kthread.
>     
>     4.      At this point, that same leaf rcu_node structure's ->qsmaskinitnext
>             field still records CPU 1 as being online.  This is absolutely
>             necessary because the scheduler uses RCU (in this case on the
>             wake-up path while awakening RCU's grace-period kthread), and
>             ->qsmaskinitnext contains RCU's idea as to which CPUs are online.
>             Therefore, invoking rcu_report_qs_rnp() after clearing CPU 1's
>             bit from ->qsmaskinitnext would result in a lockdep-RCU splat
>             due to RCU being used from an offline CPU.

Argh.. so it's your own wakeup!

This all still smells really bad. But let me try and figure out where
you introduced the problem.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ