[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAEXW_YRvz8xf-6hpwpYqS=YNa-xkn4CsuJzELJxOH_2FP+6ptQ@mail.gmail.com>
Date: Wed, 13 Mar 2024 12:01:45 -0400
From: Joel Fernandes <joel@...lfernandes.org>
To: Boqun Feng <boqun.feng@...il.com>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>, Florian Fainelli <f.fainelli@...il.com>,
linux-kernel@...r.kernel.org, kernel-team@...a.com, paulmck@...nel.org,
mingo@...nel.org, tglx@...utronix.de, rcu@...r.kernel.org,
neeraj.upadhyay@....com, urezki@...il.com, qiang.zhang1211@...il.com,
frederic@...nel.org, bigeasy@...utronix.de, anna-maria@...utronix.de,
chenzhongjin@...wei.com, yangjihong1@...wei.com, rostedt@...dmis.org
Subject: Re: Unexplained long boot delays [Was Re: [GIT PULL] RCU changes for v6.9]
Hello Florian,
On Tue, Mar 12, 2024 at 7:48 PM Boqun Feng <boqun.feng@...il.com> wrote:
>
> On Tue, Mar 12, 2024 at 02:44:14PM -0700, Linus Torvalds wrote:
> > On Tue, 12 Mar 2024 at 14:34, Florian Fainelli <f.fainelli@...il.com> wrote:
> > >
> > > and here is a log where this fails:
> > >
> > > https://gist.github.com/ffainelli/ed08a2b3e853f59343786ebd20364fc8
> >
> > You could try the 'initcall_debug' kernel command line.
> >
>
> Right, that'll be helpful.
>
> Besides I took a look at the config Florian shared, no TASKS_RCU,
> RCU_LAZY or RCU nocb is enabled. So probably the only left changes in
> the PR are around RCU expedited. Florian, could you see if you can build
> and test with CONFIG_PROVE_RCU=y (you need to select
> CONFIG_PROVE_LOCKING for that)? That'll call synchronize_rcu() +
> synchronize_rcu_expedited() before and after we switch
> rcu_scheduler_active, and it may provide more information. Thanks!
Adding to everyone's suggestions, could you also try booting with
"rcupdate.rcu_normal=1" ? This will disable expedited RCU and help us
further confirm that it is indeed expedited RCU (and then we can look
into fixing that).
Also there are 2 additional users of expedited RCU in this release I noticed:
78c3253f27e5 ("net: use synchronize_rcu_expedited in cleanup_net()")
1ebb85f9c03d ("netfilter: conntrack: expedite rcu in
nf_conntrack_cleanup_net_list")
Could you also try reverting those patches as well, and see if the
issue goes away?
thanks,
- Joel
On Tue, Mar 12, 2024 at 7:48 PM Boqun Feng <boqun.feng@...il.com> wrote:
>
> On Tue, Mar 12, 2024 at 02:44:14PM -0700, Linus Torvalds wrote:
> > On Tue, 12 Mar 2024 at 14:34, Florian Fainelli <f.fainelli@...il.com> wrote:
> > >
> > > and here is a log where this fails:
> > >
> > > https://gist.github.com/ffainelli/ed08a2b3e853f59343786ebd20364fc8
> >
> > You could try the 'initcall_debug' kernel command line.
> >
>
> Right, that'll be helpful.
>
> Besides I took a look at the config Florian shared, no TASKS_RCU,
> RCU_LAZY or RCU nocb is enabled. So probably the only left changes in
> the PR are around RCU expedited. Florian, could you see if you can build
> and test with CONFIG_PROVE_RCU=y (you need to select
> CONFIG_PROVE_LOCKING for that)? That'll call synchronize_rcu() +
> synchronize_rcu_expedited() before and after we switch
> rcu_scheduler_active, and it may provide more information. Thanks!
>
> Regards,
> Boqun
>
> > It will make the above *much* noisier, but it might - thanks to all
> > the new noise - show exactly *what* is being crazy slow to initialize.
> >
> > Because right now it's just radio silence in between those
> >
> > [ 1.926435] bcmgenet f0480000.ethernet: GENET 5.0 EPHY: 0x0000
> > [ 162.148135] unimac-mdio unimac-mdio.0: Broadcom UniMAC MDIO bus
> >
> > things, and that's presumably because some random initcall there just
> > takes forever to time out.
> >
> > Linus
Powered by blists - more mailing lists