linux-kernel - Re: [PATCH 01/13] rcu/nocb: Fix potential missed nocb

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210303013533.GA102493@lothringen>
Date:   Wed, 3 Mar 2021 02:35:33 +0100
From:   Frederic Weisbecker <frederic@...nel.org>
To:     "Paul E. McKenney" <paulmck@...nel.org>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Boqun Feng <boqun.feng@...il.com>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        Neeraj Upadhyay <neeraju@...eaurora.org>,
        Josh Triplett <josh@...htriplett.org>,
        Stable <stable@...r.kernel.org>,
        Joel Fernandes <joel@...lfernandes.org>
Subject: Re: [PATCH 01/13] rcu/nocb: Fix potential missed nocb_timer rearm

On Tue, Mar 02, 2021 at 10:17:29AM -0800, Paul E. McKenney wrote:
> On Tue, Mar 02, 2021 at 01:34:44PM +0100, Frederic Weisbecker wrote:
> 
> OK, how about if I queue a temporary commit (shown below) that just
> calls out the first scenario so that I can start testing, and you get
> me more detail on the second scenario?  I can then update the commit.

Sure, meanwhile here is an attempt for a nocb_bypass_timer based
scenario, it's overly hairy and perhaps I picture more power
in the hands of callbacks advancing on nocb_cb_wait() than it
really has:


0.          CPU 0's ->nocb_cb_kthread just called rcu_do_batch() and
            executed all the ready callbacks. Its segcblist is now
            entirely empty. It's preempted while calling local_bh_enable().

1.          A new callback is enqueued on CPU 0 with IRQs enabled. So
            the ->nocb_gp_kthread for CPU 0-2's is awaken. Then a storm
            of callbacks enqueue follows on CPU 0 and even reaches the
            bypass queue. Note that ->nocb_gp_kthread is also associated
            with CPU 0.

2.          CPU 0 queues one last bypass callback.

3.          The ->nocb_gp_kthread wakes up and associates a grace period
            with the whole queue of regular callbacks on CPU 0. It also
            tries to flush the bypass queue of CPU 0 but the bypass lock
            is contended due to the concurrent enqueuing on the previous
            step 2, so the flush fails.

4.          This ->nocb_gp_kthread arms its ->nocb_bypass_timer and goes
            to sleep waiting for the end of this future grace period.

5.          This grace period elapses before the ->nocb_bypass_timer timer
            fires. This is normally improbably given that the timer is set
            for only two jiffies, but timers can be delayed.  Besides, it
            is possible that kernel was built with CONFIG_RCU_STRICT_GRACE_PERIOD=y.

6.          The grace period ends, so rcu_gp_kthread awakens the
            ->nocb_gp_kthread but it doesn't get a chance to run on a CPU
            before a while.

7.          CPU 0's ->nocb_cb_kthread get back to the CPU after its preemption.
            As it notices the new completed grace period, it advances the callbacks
            and executes them. Then it gets preempted again on local_bh_enabled().

8.          A new callback enqueue on CPU 0 flushes itself the bypass queue
            because CPU 0's ->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy.

9.          CPUs from other ->nocb_gp_kthread groups (above CPU 2) initiate and
            elapse a few grace periods. CPU 0's ->nocb_gp_kthread still hasn't
            got an opportunity to run on a CPU and its ->nocb_bypass_timer still
            hasn't fired.

10.         CPU 0's ->nocb_cb_kthread wakes up from preemption. It notices the
            new grace periods that have elapsed, advance all the callbacks and
            executes them. Then it goes to sleep waiting for invocable callbacks.

11.         CPU 0 enqueues a new callback with interrupts disabled, and
            defers awakening its ->nocb_gp_kthread even though ->nocb_gp_sleep
            is actually false. It therefore queues its rcu_data structure's
            ->nocb_timer. At this point, CPU 0's rdp->nocb_defer_wakeup is
            RCU_NOCB_WAKE.

12.         The ->nocb_bypass_timer finally fires! It doesn't wake up
            ->nocb_gp_kthread because it's actually awaken already.
            But it cancels CPU 0's ->nocb_timer armed at 11. Yet it doesn't
            re-initialize CPU 0's ->nocb_defer_wakeup which stays with the
            stale RCU_NOCB_WAKE value. So CPU 0's->nocb_defer_wakeup and
            its ->nocb_timer are now desynchronized.
            
13.         The ->nocb_gp_kthread finally runs. It cancels the ->nocb_bypass_timer
            which has already fired. It sees the new callback on CPU 0 and
            associate it with a new grace period then sleep on it.
            
14.         The grace period elapses, rcu_gp_kthread wakes up ->nocb_gb_kthread
            which wakes up CPU 0's->nocb_cb_kthread which runs the callback.
            Both ->nocb_gp_kthread and CPU 0's->nocb_cb_kthread now wait for new
            callbacks.
            
15.         CPU 0 enqueues another callback, again with interrupts
            disabled so it must queue a timer for a deferred wakeup. However
            the value of its ->nocb_defer_wakeup is RCU_NOCB_WAKE which
            incorrectly indicates that a timer is already queued.  Instead,
            CPU 0's ->nocb_timer was cancelled in 12.  CPU 0 therefore fails
            to queue the ->nocb_timer.

16.         CPU 0 has its pending callback and it may go unnoticed until
            some other CPU ever wakes up ->nocb_gp_kthread or CPU 0 ever
            calls an explicit deferred wakeup, for example, during idle entry.


Thanks.