[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d90bd6d9-d15c-4b9b-8a69-95336e74e8f4@paulmck-laptop>
Date: Sun, 2 Mar 2025 09:39:44 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Uladzislau Rezki <urezki@...il.com>
Cc: Boqun Feng <boqun.feng@...il.com>, RCU <rcu@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>,
Frederic Weisbecker <frederic@...nel.org>,
Cheung Wall <zzqq0103.hey@...il.com>,
Neeraj upadhyay <Neeraj.Upadhyay@....com>,
Joel Fernandes <joel@...lfernandes.org>,
Oleksiy Avramchenko <oleksiy.avramchenko@...y.com>
Subject: Re: [PATCH v4 3/3] rcu: Use _full() API to debug synchronize_rcu()
On Sun, Mar 02, 2025 at 11:19:44AM +0100, Uladzislau Rezki wrote:
> On Fri, Feb 28, 2025 at 05:08:49PM -0800, Paul E. McKenney wrote:
> > On Fri, Feb 28, 2025 at 11:59:55AM -0800, Paul E. McKenney wrote:
> > > On Fri, Feb 28, 2025 at 08:12:51PM +0100, Uladzislau Rezki wrote:
> > > > Hello, Paul!
> > > >
> > > > > > > > >
> > > > > > > > > Except that I got this from overnight testing of rcu/dev on the shared
> > > > > > > > > RCU tree:
> > > > > > > > >
> > > > > > > > > WARNING: CPU: 5 PID: 14 at kernel/rcu/tree.c:1636 rcu_sr_normal_complete+0x5c/0x80
> > > > > > > > >
> > > > > > > > > I see this only on TREE05. Which should not be too surprising, given
> > > > > > > > > that this is the scenario that tests it. It happened within five minutes
> > > > > > > > > on all 14 of the TREE05 runs.
> > > > > > > > >
> > > > > > > > Hm.. This is not fun. I tested this on my system and i did not manage to
> > > > > > > > trigger this whereas you do. Something is wrong.
> > > > > > >
> > > > > > > If you have a debug patch, I would be happy to give it a go.
> > > > > > >
> > > > > > I can trigger it. But.
> > > > > >
> > > > > > Some background. I tested those patches during many hours on the stable
> > > > > > kernel which is 6.13. On that kernel i was not able to trigger it. Running
> > > > > > the rcutorture on the our shared "dev" tree, which i did now, triggers this
> > > > > > right away.
> > > > >
> > > > > Bisection? (Hey, you knew that was coming!)
> > > > >
> > > > Looks like this: rcu: Fix get_state_synchronize_rcu_full() GP-start detection
> > > >
> > > > After revert in the dev, rcutorture passes TREE05, 16 instances.
> > >
> > > Huh. We sure don't get to revert that one...
> > >
> > > Do we have a problem with the ordering in rcu_gp_init() between the calls
> > > to rcu_seq_start() and portions of rcu_sr_normal_gp_init()? For example,
> > > do we need to capture the relevant portion of the list before the call
> > > to rcu_seq_start(), and do the grace-period-start work afterwards?
> >
> > I tried moving the call to rcu_sr_normal_gp_init() before the call to
> > rcu_seq_start() and got no failures in a one-hour run of 200*TREE05.
> > Which does not necessarily mean that this is the correct fix, but I
> > figured that it might at least provide food for thought.
> >
> > Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 48384fa2eaeb8..d3efeff7740e7 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -1819,10 +1819,10 @@ static noinline_for_stack bool rcu_gp_init(void)
> >
> > /* Advance to a new grace period and initialize state. */
> > record_gp_stall_check_time();
> > + start_new_poll = rcu_sr_normal_gp_init();
> > /* Record GP times before starting GP, hence rcu_seq_start(). */
> > rcu_seq_start(&rcu_state.gp_seq);
> > ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
> > - start_new_poll = rcu_sr_normal_gp_init();
> > trace_rcu_grace_period(rcu_state.name, rcu_state.gp_seq, TPS("start"));
> > rcu_poll_gp_seq_start(&rcu_state.gp_seq_polled_snap);
> > raw_spin_unlock_irq_rcu_node(rnp);
> >
> Running this 24 hours already. TREE05 * 16 scenario. I do not see any
> warnings yet. There is a race, indeed. The gp_seq is moved forward,
> wheres clients can still come until rcu_sr_normal_gp_init() places a
> dummy-wait-head for this GP.
>
> Thank you for testing Paul and looking to this :)
Very good! This is a bug in this commit of mine:
012f47f0f806 ("rcu: Fix get_state_synchronize_rcu_full() GP-start detection")
Boqun, could you please fold this into that commit with something like
this added to the commit log just before the paragraph starting with
"Although this fixes 91a967fd6934"?
However, simply changing get_state_synchronize_rcu_full() function
to use rcu_state.gp_seq instead of the root rcu_node structure's
->gp_seq field results in a theoretical bug in kernels booted
with rcutree.rcu_normal_wake_from_gp=1 due to the following
sequence of events:
o The rcu_gp_init() function invokes rcu_seq_start()
to officially start a new grace period.
o A new RCU reader begins, referencing X from some
RCU-protected list. The new grace period is not
obligated to wait for this reader.
o An updater removes X, then calls synchronize_rcu(),
which queues a wait element.
o The grace period ends, awakening the updater, which
frees X while the reader is still referencing it.
The reason that this is theoretical is that although the
grace period has officially started, none of the CPUs are
officially aware of this, and thus will have to assume that
the RCU reader pre-dated the start of the grace period.
Except for kernels built with CONFIG_PROVE_RCU=y, which use the
polled grace-period APIs, which can and do complain bitterly when
this sequence of events occurs. Not only that, there might be
some future RCU grace-period mechanism that pulls this sequence
of events from theory into practice. This commit therefore
also pulls the call to rcu_sr_normal_gp_init() to precede that
to rcu_seq_start().
I will let you guys decide whether the call to rcu_sr_normal_gp_init()
needs a comment, and, if so, what that comment should say. ;-)
Thanx, Paul
Powered by blists - more mailing lists