linux-kernel - Re: [PATCH v2 5/8] rcu/nocb: Wake up gp thread when flushing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20220626135240.GP1790663@paulmck-ThinkPad-P17-Gen-1>
Date:   Sun, 26 Jun 2022 06:52:40 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Joel Fernandes <joel@...lfernandes.org>
Cc:     rcu@...r.kernel.org, linux-kernel@...r.kernel.org,
        rushikesh.s.kadam@...el.com, urezki@...il.com,
        neeraj.iitr10@...il.com, frederic@...nel.org, rostedt@...dmis.org,
        vineeth@...byteword.org
Subject: Re: [PATCH v2 5/8] rcu/nocb: Wake up gp thread when flushing

On Sun, Jun 26, 2022 at 01:45:32PM +0000, Joel Fernandes wrote:
> On Sat, Jun 25, 2022 at 09:06:22PM -0700, Paul E. McKenney wrote:
> > On Wed, Jun 22, 2022 at 10:50:59PM +0000, Joel Fernandes (Google) wrote:
> > > We notice that rcu_barrier() can take a really long time. It appears
> > > that this can happen when all CBs are lazy and the timer does not fire
> > > yet. So after flushing, nothing wakes up GP thread. This patch forces
> > > GP thread to wake when bypass flushing happens, this fixes the
> > > rcu_barrier() delays with lazy CBs.
> > 
> > I am wondering if there is a bug in non-rcu_barrier() lazy callback
> > processing hiding here as well?
> 
> I don't think so because in both nocb_try_bypass and nocb_gp_wait, we are not
> going to an indefinite sleep after the flush. However, with rcu_barrier() ,
> there is nothing to keep the RCU GP thread awake. That's my theory at least.
> In practice, I have not been able to reproduce this issue with
> non-rcu_barrier().
> 
> With rcu_barrier() I happen to hit it thanks to the rcuscale changes I did.
> That's an interesting story! As I apply call_rcu_lazy() to the file table
> code, turns out that on boot, the initram unpacking code continously triggers
> call_rcu_lazy(). This happens apparently in a different thread than the one
> that rcuscale is running in. In rcuscale, I did rcu_barrier() at init time
> and this stalled for a long time to my surprise, and this patch fixed it.

Cool!

Then should this wake_nocb_gp() instead go into the rcu_barrier()
code path?  As shown below, wouldn't we be doing some spurious wakeups?

							Thanx, Paul

> thanks,
> 
>  - Joel
> 
> 
> > 
> > 							Thanx, Paul
> > 
> > > Signed-off-by: Joel Fernandes (Google) <joel@...lfernandes.org>
> > > ---
> > >  kernel/rcu/tree_nocb.h | 2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> > > index 2f5da12811a5..b481f1ea57c0 100644
> > > --- a/kernel/rcu/tree_nocb.h
> > > +++ b/kernel/rcu/tree_nocb.h
> > > @@ -325,6 +325,8 @@ static bool rcu_nocb_do_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> > >  	rcu_segcblist_insert_pend_cbs(&rdp->cblist, &rcl);
> > >  	WRITE_ONCE(rdp->nocb_bypass_first, j);
> > >  	rcu_nocb_bypass_unlock(rdp);
> > > +
> > > +	wake_nocb_gp(rdp, true);
> > >  	return true;
> > >  }
> > >  
> > > -- 
> > > 2.37.0.rc0.104.g0611611a94-goog
> > >