linux-kernel - Re: [REGRESSION 4.20-rc1] 45975c7d21a1 ("rcu: Define RCU-sched API in terms of RCU for Tree RCU PREEMPT builds")

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20181114202013.GA27603@linux.ibm.com>
Date:   Wed, 14 Nov 2018 12:20:13 -0800
From:   "Paul E. McKenney" <paulmck@...ux.ibm.com>
To:     Ville Syrjälä <ville.syrjala@...ux.intel.com>
Cc:     linux-kernel@...r.kernel.org, Andi Kleen <ak@...ux.intel.com>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Viresh Kumar <viresh.kumar@...aro.org>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        "H. Peter Anvin" <hpa@...or.com>
Subject: Re: [REGRESSION 4.20-rc1] 45975c7d21a1 ("rcu: Define RCU-sched API
 in terms of RCU for Tree RCU PREEMPT builds")

On Tue, Nov 13, 2018 at 07:10:37AM -0800, Paul E. McKenney wrote:
> On Tue, Nov 13, 2018 at 03:54:53PM +0200, Ville Syrjälä wrote:
> > Hi Paul,
> > 
> > After 4.20-rc1 some of my 32bit UP machines no longer reboot/shutdown.
> > I bisected this down to commit 45975c7d21a1 ("rcu: Define RCU-sched
> > API in terms of RCU for Tree RCU PREEMPT builds").
> > 
> > I traced the hang into
> > -> cpufreq_suspend()
> >  -> cpufreq_stop_governor()
> >   -> cpufreq_dbs_governor_stop()
> >    -> gov_clear_update_util()
> >     -> synchronize_sched()
> >      -> synchronize_rcu()
> > 
> > Only PREEMPT=y is affected for obvious reasons, but that couldn't
> > explain why the same UP kernel booted on an SMP machine worked fine.
> > Eventually I realized that the difference between working and
> > non-working machine was IOAPIC vs. PIC. With initcall_debug I saw
> > that we mask everything in the PIC before cpufreq is shut down,
> > and came up with the following fix:
> > 
> > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > index 7aa3dcad2175..f88bf3c77fc0 100644
> > --- a/drivers/cpufreq/cpufreq.c
> > +++ b/drivers/cpufreq/cpufreq.c
> > @@ -2605,4 +2605,4 @@ static int __init cpufreq_core_init(void)
> >         return 0;
> >  }
> >  module_param(off, int, 0444);
> > -core_initcall(cpufreq_core_init);
> > +late_initcall(cpufreq_core_init);
> 
> Thank you for testing this and tracking it down!
> 
> I am glad that you have a fix, but I hope that we can arrive at a less
> constraining one.
> 
> > Here's the resulting change in inutcall_debug:
> >   pci 0000:00:00.1: shutdown
> >   hub 4-0:1.0: hub_ext_port_status failed (err = -110)
> >   agpgart-intel 0000:00:00.0: shutdown
> > + PM: Calling cpufreq_suspend+0x0/0x100
> >   PM: Calling mce_syscore_shutdown+0x0/0x10
> >   PM: Calling i8259A_shutdown+0x0/0x10
> > - PM: Calling cpufreq_suspend+0x0/0x100
> > + reboot: Restarting system
> > + reboot: machine restart
> > 
> > I didn't really look into what other ramifications the cpufreq
> > initcall change might have. cpufreq_global_kobject worries
> > me a bit. Maybe that one has to remain in core_initcall() and
> > we could just move the suspend to late_initcall()? Anyways,
> > I figured I'd leave this for someone more familiar with the
> > code to figure out ;) 
> 
> Let me guess...
> 
> When the system suspends or shuts down, there comes a point after which
> there is only a single CPU that is running with preemption and interrupts
> are disabled.  At this point, RCU must change the way that it works, and
> the commit you bisected to would make the change more necessary.  But if
> I am guessing correctly, we have just been getting lucky in the past.
> 
> It looks like RCU needs to create a struct syscore_ops with a shutdown
> function and pass this to register_syscore_ops().  Maybe a suspend
> function as well.  And RCU needs to invoke register_syscore_ops() at
> a time that causes RCU's shutdown function to be invoked in the right
> order with respect to the other work in flight.  The hope would be that
> RCU's suspend function gets called just as the system transitions into
> a mode where the scheduler is no longer active, give or take.
> 
> Does this make sense, or am I confused?

Well, it certainly does not make sense in that blocking is still legal
at .shutdown() invocation time, which means that RCU cannot revert to
its boot-time approach at that point.  Looks like I need hooks in a
bunch of arch-dependent functions.  Which is certainly doable, but will
take a bit more digging.

							Thanx, Paul