lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 17 Nov 2021 22:47:50 +0100
From:   Frederic Weisbecker <frederic@...nel.org>
To:     "Paul E. McKenney" <paulmck@...nel.org>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Uladzislau Rezki <urezki@...il.com>,
        Neeraj Upadhyay <quic_neeraju@...cinc.com>,
        Boqun Feng <boqun.feng@...il.com>,
        Josh Triplett <josh@...htriplett.org>,
        Joel Fernandes <joel@...lfernandes.org>, rcu@...r.kernel.org
Subject: Re: [PATCH 1/6] rcu/nocb: Remove rdp from nocb list when de-offloaded

On Wed, Nov 17, 2021 at 10:53:41AM -0800, Paul E. McKenney wrote:
> On Wed, Nov 17, 2021 at 04:56:32PM +0100, Frederic Weisbecker wrote:
> >  	pr_info("Offloading %d\n", rdp->cpu);
> > +
> > +	/*
> > +	 * Iterate this CPU on nocb_gp_wait(). We do it before locking nocb_gp_lock,
> > +	 * resetting nocb_gp_sleep and waking up the related "rcuog". Since nocb_gp_wait()
> > +	 * in turn locks nocb_gp_lock before setting nocb_gp_sleep again, we are guaranteed
> > +	 * to iterate this new rdp before "rcuog" goes to sleep again.
> 
> Just to make sure that I understand...
> 
> The ->nocb_entry_rdp list is RCU-protected, with an odd flavor of RCU.
> The list_add_tail_rcu() handles list insertion.  For list deletion,
> on the one hand, the rcu_data structures are never freed, so their
> lifetime never ends.  But one could be removed during an de-offloading
> operation, then re-added by a later offloading operation.  It would be
> bad if a reader came along for that ride because that reader would end
> up skipping all the rcu_data structures that were in the list after the
> initial position of the one that was removed and added back.

How did I miss that :-(

> 
> The trick seems to be that the de-offloading process cannot complete
> until the relevant rcuog kthread has acknowledged the de-offloading,
> which it cannot do until it has completed the list_for_each_entry_rcu()
> loop.  And the rcuog kthread is the only thing that traverses the list,
> except for the show_rcu_nocb_state() function, more on which later.
> 
> Therefore, it is not possible for the rcuog kthread to come along for
> that ride.

Actually it's worse than that: the node is removed _after_ the kthread
acknowledges deoffloading and added _before_ the kthread acknowledges
offloading. So a single rcuog loop can well run through a deletion/re-add
pair altogether.

Now since we force another loop with the new add guaranteed visible, the new
loop should handle the missed rdp's that went shortcut by the race.

Let's hope I'm not missing something else... And yes that definetly needs
a fat comment.

> 
> On to show_rcu_nocb_state()...
> 
> This does not actually traverse the list, but instead looks at the ->cpu
> field of the next structure.  Because the ->next pointer is never NULLed,
> the worst that can happen is that a confusing ->cpu field is printed,
> for example, the one that was in effect prior to the de-offloading
> operation.  But that number would have been printed anyway had the
> show_rcu_nocb_state() function not been delayed across the de-offloading
> and offloading.
> 
> So no harm done.

Exactly, that part is racy by nature.

> 
> Did I get it right?  If so, the comment might need help.  If not,
> what am I missing?

You got it right!

Thanks!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ