linux-kernel - Re: [PATCH 01/13] rcu/nocb: Fix potential missed nocb

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210303020643.GV2696@paulmck-ThinkPad-P72>
Date:   Tue, 2 Mar 2021 18:06:43 -0800
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Frederic Weisbecker <frederic@...nel.org>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Boqun Feng <boqun.feng@...il.com>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        Neeraj Upadhyay <neeraju@...eaurora.org>,
        Josh Triplett <josh@...htriplett.org>,
        Stable <stable@...r.kernel.org>,
        Joel Fernandes <joel@...lfernandes.org>
Subject: Re: [PATCH 01/13] rcu/nocb: Fix potential missed nocb_timer rearm

On Wed, Mar 03, 2021 at 02:35:33AM +0100, Frederic Weisbecker wrote:
> On Tue, Mar 02, 2021 at 10:17:29AM -0800, Paul E. McKenney wrote:
> > On Tue, Mar 02, 2021 at 01:34:44PM +0100, Frederic Weisbecker wrote:
> > 
> > OK, how about if I queue a temporary commit (shown below) that just
> > calls out the first scenario so that I can start testing, and you get
> > me more detail on the second scenario?  I can then update the commit.
> 
> Sure, meanwhile here is an attempt for a nocb_bypass_timer based
> scenario, it's overly hairy and perhaps I picture more power
> in the hands of callbacks advancing on nocb_cb_wait() than it
> really has:

Thank you very much!

I must defer looking through this in detail until I am more awake,
but I do very much like the fine-grained exposition.

							Thanx, Paul

> 0.          CPU 0's ->nocb_cb_kthread just called rcu_do_batch() and
>             executed all the ready callbacks. Its segcblist is now
>             entirely empty. It's preempted while calling local_bh_enable().
> 
> 1.          A new callback is enqueued on CPU 0 with IRQs enabled. So
>             the ->nocb_gp_kthread for CPU 0-2's is awaken. Then a storm
>             of callbacks enqueue follows on CPU 0 and even reaches the
>             bypass queue. Note that ->nocb_gp_kthread is also associated
>             with CPU 0.
> 
> 2.          CPU 0 queues one last bypass callback.
> 
> 3.          The ->nocb_gp_kthread wakes up and associates a grace period
>             with the whole queue of regular callbacks on CPU 0. It also
>             tries to flush the bypass queue of CPU 0 but the bypass lock
>             is contended due to the concurrent enqueuing on the previous
>             step 2, so the flush fails.
> 
> 4.          This ->nocb_gp_kthread arms its ->nocb_bypass_timer and goes
>             to sleep waiting for the end of this future grace period.
> 
> 5.          This grace period elapses before the ->nocb_bypass_timer timer
>             fires. This is normally improbably given that the timer is set
>             for only two jiffies, but timers can be delayed.  Besides, it
>             is possible that kernel was built with CONFIG_RCU_STRICT_GRACE_PERIOD=y.
> 
> 6.          The grace period ends, so rcu_gp_kthread awakens the
>             ->nocb_gp_kthread but it doesn't get a chance to run on a CPU
>             before a while.
> 
> 7.          CPU 0's ->nocb_cb_kthread get back to the CPU after its preemption.
>             As it notices the new completed grace period, it advances the callbacks
>             and executes them. Then it gets preempted again on local_bh_enabled().
> 
> 8.          A new callback enqueue on CPU 0 flushes itself the bypass queue
>             because CPU 0's ->nocb_nobypass_count < nocb_nobypass_lim_per_jiffy.
> 
> 9.          CPUs from other ->nocb_gp_kthread groups (above CPU 2) initiate and
>             elapse a few grace periods. CPU 0's ->nocb_gp_kthread still hasn't
>             got an opportunity to run on a CPU and its ->nocb_bypass_timer still
>             hasn't fired.
> 
> 10.         CPU 0's ->nocb_cb_kthread wakes up from preemption. It notices the
>             new grace periods that have elapsed, advance all the callbacks and
>             executes them. Then it goes to sleep waiting for invocable callbacks.
> 
> 11.         CPU 0 enqueues a new callback with interrupts disabled, and
>             defers awakening its ->nocb_gp_kthread even though ->nocb_gp_sleep
>             is actually false. It therefore queues its rcu_data structure's
>             ->nocb_timer. At this point, CPU 0's rdp->nocb_defer_wakeup is
>             RCU_NOCB_WAKE.
> 
> 12.         The ->nocb_bypass_timer finally fires! It doesn't wake up
>             ->nocb_gp_kthread because it's actually awaken already.
>             But it cancels CPU 0's ->nocb_timer armed at 11. Yet it doesn't
>             re-initialize CPU 0's ->nocb_defer_wakeup which stays with the
>             stale RCU_NOCB_WAKE value. So CPU 0's->nocb_defer_wakeup and
>             its ->nocb_timer are now desynchronized.
>             
> 13.         The ->nocb_gp_kthread finally runs. It cancels the ->nocb_bypass_timer
>             which has already fired. It sees the new callback on CPU 0 and
>             associate it with a new grace period then sleep on it.
>             
> 14.         The grace period elapses, rcu_gp_kthread wakes up ->nocb_gb_kthread
>             which wakes up CPU 0's->nocb_cb_kthread which runs the callback.
>             Both ->nocb_gp_kthread and CPU 0's->nocb_cb_kthread now wait for new
>             callbacks.
>             
> 15.         CPU 0 enqueues another callback, again with interrupts
>             disabled so it must queue a timer for a deferred wakeup. However
>             the value of its ->nocb_defer_wakeup is RCU_NOCB_WAKE which
>             incorrectly indicates that a timer is already queued.  Instead,
>             CPU 0's ->nocb_timer was cancelled in 12.  CPU 0 therefore fails
>             to queue the ->nocb_timer.
> 
> 16.         CPU 0 has its pending callback and it may go unnoticed until
>             some other CPU ever wakes up ->nocb_gp_kthread or CPU 0 ever
>             calls an explicit deferred wakeup, for example, during idle entry.
> 
> 
> Thanks.