lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181113135453.GW9144@intel.com>
Date:   Tue, 13 Nov 2018 15:54:53 +0200
From:   Ville Syrjälä <ville.syrjala@...ux.intel.com>
To:     "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
Cc:     linux-kernel@...r.kernel.org, Andi Kleen <ak@...ux.intel.com>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Viresh Kumar <viresh.kumar@...aro.org>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        "H. Peter Anvin" <hpa@...or.com>
Subject: [REGRESSION 4.20-rc1] 45975c7d21a1 ("rcu: Define RCU-sched API in
 terms of RCU for Tree RCU PREEMPT builds")

Hi Paul,

After 4.20-rc1 some of my 32bit UP machines no longer reboot/shutdown.
I bisected this down to commit 45975c7d21a1 ("rcu: Define RCU-sched
API in terms of RCU for Tree RCU PREEMPT builds").

I traced the hang into
-> cpufreq_suspend()
 -> cpufreq_stop_governor()
  -> cpufreq_dbs_governor_stop()
   -> gov_clear_update_util()
    -> synchronize_sched()
     -> synchronize_rcu()

Only PREEMPT=y is affected for obvious reasons, but that couldn't
explain why the same UP kernel booted on an SMP machine worked fine.
Eventually I realized that the difference between working and
non-working machine was IOAPIC vs. PIC. With initcall_debug I saw
that we mask everything in the PIC before cpufreq is shut down,
and came up with the following fix:

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 7aa3dcad2175..f88bf3c77fc0 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -2605,4 +2605,4 @@ static int __init cpufreq_core_init(void)
        return 0;
 }
 module_param(off, int, 0444);
-core_initcall(cpufreq_core_init);
+late_initcall(cpufreq_core_init);

Here's the resulting change in inutcall_debug:
  pci 0000:00:00.1: shutdown
  hub 4-0:1.0: hub_ext_port_status failed (err = -110)
  agpgart-intel 0000:00:00.0: shutdown
+ PM: Calling cpufreq_suspend+0x0/0x100
  PM: Calling mce_syscore_shutdown+0x0/0x10
  PM: Calling i8259A_shutdown+0x0/0x10
- PM: Calling cpufreq_suspend+0x0/0x100
+ reboot: Restarting system
+ reboot: machine restart

I didn't really look into what other ramifications the cpufreq
initcall change might have. cpufreq_global_kobject worries
me a bit. Maybe that one has to remain in core_initcall() and
we could just move the suspend to late_initcall()? Anyways,
I figured I'd leave this for someone more familiar with the
code to figure out ;) 

-- 
Ville Syrjälä
Intel

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ