[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20181114202013.GA27603@linux.ibm.com>
Date: Wed, 14 Nov 2018 12:20:13 -0800
From: "Paul E. McKenney" <paulmck@...ux.ibm.com>
To: Ville Syrjälä <ville.syrjala@...ux.intel.com>
Cc: linux-kernel@...r.kernel.org, Andi Kleen <ak@...ux.intel.com>,
"Rafael J. Wysocki" <rjw@...ysocki.net>,
Viresh Kumar <viresh.kumar@...aro.org>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
"H. Peter Anvin" <hpa@...or.com>
Subject: Re: [REGRESSION 4.20-rc1] 45975c7d21a1 ("rcu: Define RCU-sched API
in terms of RCU for Tree RCU PREEMPT builds")
On Tue, Nov 13, 2018 at 07:10:37AM -0800, Paul E. McKenney wrote:
> On Tue, Nov 13, 2018 at 03:54:53PM +0200, Ville Syrjälä wrote:
> > Hi Paul,
> >
> > After 4.20-rc1 some of my 32bit UP machines no longer reboot/shutdown.
> > I bisected this down to commit 45975c7d21a1 ("rcu: Define RCU-sched
> > API in terms of RCU for Tree RCU PREEMPT builds").
> >
> > I traced the hang into
> > -> cpufreq_suspend()
> > -> cpufreq_stop_governor()
> > -> cpufreq_dbs_governor_stop()
> > -> gov_clear_update_util()
> > -> synchronize_sched()
> > -> synchronize_rcu()
> >
> > Only PREEMPT=y is affected for obvious reasons, but that couldn't
> > explain why the same UP kernel booted on an SMP machine worked fine.
> > Eventually I realized that the difference between working and
> > non-working machine was IOAPIC vs. PIC. With initcall_debug I saw
> > that we mask everything in the PIC before cpufreq is shut down,
> > and came up with the following fix:
> >
> > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > index 7aa3dcad2175..f88bf3c77fc0 100644
> > --- a/drivers/cpufreq/cpufreq.c
> > +++ b/drivers/cpufreq/cpufreq.c
> > @@ -2605,4 +2605,4 @@ static int __init cpufreq_core_init(void)
> > return 0;
> > }
> > module_param(off, int, 0444);
> > -core_initcall(cpufreq_core_init);
> > +late_initcall(cpufreq_core_init);
>
> Thank you for testing this and tracking it down!
>
> I am glad that you have a fix, but I hope that we can arrive at a less
> constraining one.
>
> > Here's the resulting change in inutcall_debug:
> > pci 0000:00:00.1: shutdown
> > hub 4-0:1.0: hub_ext_port_status failed (err = -110)
> > agpgart-intel 0000:00:00.0: shutdown
> > + PM: Calling cpufreq_suspend+0x0/0x100
> > PM: Calling mce_syscore_shutdown+0x0/0x10
> > PM: Calling i8259A_shutdown+0x0/0x10
> > - PM: Calling cpufreq_suspend+0x0/0x100
> > + reboot: Restarting system
> > + reboot: machine restart
> >
> > I didn't really look into what other ramifications the cpufreq
> > initcall change might have. cpufreq_global_kobject worries
> > me a bit. Maybe that one has to remain in core_initcall() and
> > we could just move the suspend to late_initcall()? Anyways,
> > I figured I'd leave this for someone more familiar with the
> > code to figure out ;)
>
> Let me guess...
>
> When the system suspends or shuts down, there comes a point after which
> there is only a single CPU that is running with preemption and interrupts
> are disabled. At this point, RCU must change the way that it works, and
> the commit you bisected to would make the change more necessary. But if
> I am guessing correctly, we have just been getting lucky in the past.
>
> It looks like RCU needs to create a struct syscore_ops with a shutdown
> function and pass this to register_syscore_ops(). Maybe a suspend
> function as well. And RCU needs to invoke register_syscore_ops() at
> a time that causes RCU's shutdown function to be invoked in the right
> order with respect to the other work in flight. The hope would be that
> RCU's suspend function gets called just as the system transitions into
> a mode where the scheduler is no longer active, give or take.
>
> Does this make sense, or am I confused?
Well, it certainly does not make sense in that blocking is still legal
at .shutdown() invocation time, which means that RCU cannot revert to
its boot-time approach at that point. Looks like I need hooks in a
bunch of arch-dependent functions. Which is certainly doable, but will
take a bit more digging.
Thanx, Paul
Powered by blists - more mailing lists