lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211117222554.GP641268@paulmck-ThinkPad-P17-Gen-1>
Date:   Wed, 17 Nov 2021 14:25:54 -0800
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Frederic Weisbecker <frederic@...nel.org>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Uladzislau Rezki <urezki@...il.com>,
        Neeraj Upadhyay <quic_neeraju@...cinc.com>,
        Boqun Feng <boqun.feng@...il.com>,
        Josh Triplett <josh@...htriplett.org>,
        Joel Fernandes <joel@...lfernandes.org>, rcu@...r.kernel.org
Subject: Re: [PATCH 1/6] rcu/nocb: Remove rdp from nocb list when de-offloaded

On Wed, Nov 17, 2021 at 10:47:50PM +0100, Frederic Weisbecker wrote:
> On Wed, Nov 17, 2021 at 10:53:41AM -0800, Paul E. McKenney wrote:
> > On Wed, Nov 17, 2021 at 04:56:32PM +0100, Frederic Weisbecker wrote:
> > >  	pr_info("Offloading %d\n", rdp->cpu);
> > > +
> > > +	/*
> > > +	 * Iterate this CPU on nocb_gp_wait(). We do it before locking nocb_gp_lock,
> > > +	 * resetting nocb_gp_sleep and waking up the related "rcuog". Since nocb_gp_wait()
> > > +	 * in turn locks nocb_gp_lock before setting nocb_gp_sleep again, we are guaranteed
> > > +	 * to iterate this new rdp before "rcuog" goes to sleep again.
> > 
> > Just to make sure that I understand...
> > 
> > The ->nocb_entry_rdp list is RCU-protected, with an odd flavor of RCU.
> > The list_add_tail_rcu() handles list insertion.  For list deletion,
> > on the one hand, the rcu_data structures are never freed, so their
> > lifetime never ends.  But one could be removed during an de-offloading
> > operation, then re-added by a later offloading operation.  It would be
> > bad if a reader came along for that ride because that reader would end
> > up skipping all the rcu_data structures that were in the list after the
> > initial position of the one that was removed and added back.
> 
> How did I miss that :-(
> 
> > 
> > The trick seems to be that the de-offloading process cannot complete
> > until the relevant rcuog kthread has acknowledged the de-offloading,
> > which it cannot do until it has completed the list_for_each_entry_rcu()
> > loop.  And the rcuog kthread is the only thing that traverses the list,
> > except for the show_rcu_nocb_state() function, more on which later.
> > 
> > Therefore, it is not possible for the rcuog kthread to come along for
> > that ride.
> 
> Actually it's worse than that: the node is removed _after_ the kthread
> acknowledges deoffloading and added _before_ the kthread acknowledges
> offloading. So a single rcuog loop can well run through a deletion/re-add
> pair altogether.
> 
> Now since we force another loop with the new add guaranteed visible, the new
> loop should handle the missed rdp's that went shortcut by the race.
> 
> Let's hope I'm not missing something else... And yes that definetly needs
> a fat comment.
> 
> > 
> > On to show_rcu_nocb_state()...
> > 
> > This does not actually traverse the list, but instead looks at the ->cpu
> > field of the next structure.  Because the ->next pointer is never NULLed,
> > the worst that can happen is that a confusing ->cpu field is printed,
> > for example, the one that was in effect prior to the de-offloading
> > operation.  But that number would have been printed anyway had the
> > show_rcu_nocb_state() function not been delayed across the de-offloading
> > and offloading.
> > 
> > So no harm done.
> 
> Exactly, that part is racy by nature.
> 
> > 
> > Did I get it right?  If so, the comment might need help.  If not,
> > what am I missing?
> 
> You got it right!

Next question...  What things are the two of us put together missing?  ;-)

							Thanx, Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ