lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20181113151037.GG4170@linux.ibm.com>
Date:   Tue, 13 Nov 2018 07:10:37 -0800
From:   "Paul E. McKenney" <paulmck@...ux.ibm.com>
To:     Ville Syrjälä <ville.syrjala@...ux.intel.com>
Cc:     linux-kernel@...r.kernel.org, Andi Kleen <ak@...ux.intel.com>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Viresh Kumar <viresh.kumar@...aro.org>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        "H. Peter Anvin" <hpa@...or.com>
Subject: Re: [REGRESSION 4.20-rc1] 45975c7d21a1 ("rcu: Define RCU-sched API
 in terms of RCU for Tree RCU PREEMPT builds")

On Tue, Nov 13, 2018 at 03:54:53PM +0200, Ville Syrjälä wrote:
> Hi Paul,
> 
> After 4.20-rc1 some of my 32bit UP machines no longer reboot/shutdown.
> I bisected this down to commit 45975c7d21a1 ("rcu: Define RCU-sched
> API in terms of RCU for Tree RCU PREEMPT builds").
> 
> I traced the hang into
> -> cpufreq_suspend()
>  -> cpufreq_stop_governor()
>   -> cpufreq_dbs_governor_stop()
>    -> gov_clear_update_util()
>     -> synchronize_sched()
>      -> synchronize_rcu()
> 
> Only PREEMPT=y is affected for obvious reasons, but that couldn't
> explain why the same UP kernel booted on an SMP machine worked fine.
> Eventually I realized that the difference between working and
> non-working machine was IOAPIC vs. PIC. With initcall_debug I saw
> that we mask everything in the PIC before cpufreq is shut down,
> and came up with the following fix:
> 
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 7aa3dcad2175..f88bf3c77fc0 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -2605,4 +2605,4 @@ static int __init cpufreq_core_init(void)
>         return 0;
>  }
>  module_param(off, int, 0444);
> -core_initcall(cpufreq_core_init);
> +late_initcall(cpufreq_core_init);

Thank you for testing this and tracking it down!

I am glad that you have a fix, but I hope that we can arrive at a less
constraining one.

> Here's the resulting change in inutcall_debug:
>   pci 0000:00:00.1: shutdown
>   hub 4-0:1.0: hub_ext_port_status failed (err = -110)
>   agpgart-intel 0000:00:00.0: shutdown
> + PM: Calling cpufreq_suspend+0x0/0x100
>   PM: Calling mce_syscore_shutdown+0x0/0x10
>   PM: Calling i8259A_shutdown+0x0/0x10
> - PM: Calling cpufreq_suspend+0x0/0x100
> + reboot: Restarting system
> + reboot: machine restart
> 
> I didn't really look into what other ramifications the cpufreq
> initcall change might have. cpufreq_global_kobject worries
> me a bit. Maybe that one has to remain in core_initcall() and
> we could just move the suspend to late_initcall()? Anyways,
> I figured I'd leave this for someone more familiar with the
> code to figure out ;) 

Let me guess...

When the system suspends or shuts down, there comes a point after which
there is only a single CPU that is running with preemption and interrupts
are disabled.  At this point, RCU must change the way that it works, and
the commit you bisected to would make the change more necessary.  But if
I am guessing correctly, we have just been getting lucky in the past.

It looks like RCU needs to create a struct syscore_ops with a shutdown
function and pass this to register_syscore_ops().  Maybe a suspend
function as well.  And RCU needs to invoke register_syscore_ops() at
a time that causes RCU's shutdown function to be invoked in the right
order with respect to the other work in flight.  The hope would be that
RCU's suspend function gets called just as the system transitions into
a mode where the scheduler is no longer active, give or take.

Does this make sense, or am I confused?

							Thanx, Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ