lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230315111606.GB2006103@hirez.programming.kicks-ass.net>
Date:   Wed, 15 Mar 2023 12:16:06 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Alexey Klimov <alexey.klimov@...aro.org>
Cc:     draszik@...gle.com, peter.griffin@...aro.org,
        willmcvicker@...gle.com, mingo@...nel.org, ulf.hansson@...aro.org,
        tony@...mide.com, linux-block@...r.kernel.org,
        linux-kernel@...r.kernel.org, axboe@...nel.dk,
        alim.akhtar@...sung.com, regressions@...ts.linux.dev,
        avri.altman@....com, bvanassche@....org, klimova@...gle.com
Subject: Re: [REGRESSION] CPUIDLE_FLAG_RCU_IDLE, blk_mq_freeze_queue_wait()
 and slow-stuck reboots


(could you wrap your email please)

On Tue, Mar 14, 2023 at 11:00:04PM +0000, Alexey Klimov wrote:
> #regzbot introduced: 0c5ffc3d7b15 #regzbot title:
> CPUIDLE_FLAG_RCU_IDLE, blk_mq_freeze_queue_wait() and slow-stuck
> reboots
> 
> The upstream changes are being merged into android-mainline repo and
> at some point we started to observe kernel panics on reboot or long
> reboot times.

On what hardware? I find it somewhat hard to follow this DT code :/

> Looks like adding CPUIDLE_FLAG_RCU_IDLE flag to idle driver caused
> this behaviour.  The minimal change that is required for this system
> to avoid the regression would be one liner that removes the flag
> (below).
> 
> But if it is a real regression, then other idle drivers if used will
> likely cause this regression too withe same ufshcd driver. There is
> also a suspicion that CPUIDLE_FLAG_RCU_IDLE just revealed or uncovered
> some other problem.
> 
> Any thoughts on this? 

So ARM has a weird 'rule' in that idle state 0 (wfi) should not have
RCU_IDLE set, while others should have.

Of the dt_init_idle_driver() users:

 - cpuidle-arm: arm_enter_idle_state()
 - cpuidle-big_little: bl_enter_powerdown() does ct_cpuidle_{enter,exit}()
 - cpuidle-psci: psci_enter_idle_state() uses CPU_PM_CPU_IDLE_ENTER_PARAM_RCU()
 - cpuidle-qcom-spm: spm_enter_idle_state() uses CPU_PM_CPU_IDLE_ENTER_PARAM()
 - cpuidle-riscv-sbi: sbi_cpuidle_enter_state() uses CPU_PM_CPU_IDLE_ENTER_*_PARAM()

All of them start on index 1 and hence should have RCU_IDLE set, but at
least the arm, qcom-spm and riscv-sbi don't actually appear to abide by
the rules.

Fixing that gives me the below; does that help?

---

diff --git a/drivers/cpuidle/cpuidle-arm.c b/drivers/cpuidle/cpuidle-arm.c
index 7cfb980a357d..58fa81f0fa7d 100644
--- a/drivers/cpuidle/cpuidle-arm.c
+++ b/drivers/cpuidle/cpuidle-arm.c
@@ -39,7 +39,7 @@ static __cpuidle int arm_enter_idle_state(struct cpuidle_device *dev,
 	 * will call the CPU ops suspend protocol with idle index as a
 	 * parameter.
 	 */
-	return CPU_PM_CPU_IDLE_ENTER(arm_cpuidle_suspend, idx);
+	return CPU_PM_CPU_IDLE_ENTER_RCU(arm_cpuidle_suspend, idx);
 }
 
 static struct cpuidle_driver arm_idle_driver __initdata = {
diff --git a/drivers/cpuidle/cpuidle-qcom-spm.c b/drivers/cpuidle/cpuidle-qcom-spm.c
index c6e2e91bb4c3..429db2d40114 100644
--- a/drivers/cpuidle/cpuidle-qcom-spm.c
+++ b/drivers/cpuidle/cpuidle-qcom-spm.c
@@ -64,7 +64,7 @@ static __cpuidle int spm_enter_idle_state(struct cpuidle_device *dev,
 	struct cpuidle_qcom_spm_data *data = container_of(drv, struct cpuidle_qcom_spm_data,
 							  cpuidle_driver);
 
-	return CPU_PM_CPU_IDLE_ENTER_PARAM(qcom_cpu_spc, idx, data->spm);
+	return CPU_PM_CPU_IDLE_ENTER_PARAM_RCU(qcom_cpu_spc, idx, data->spm);
 }
 
 static struct cpuidle_driver qcom_spm_idle_driver = {
diff --git a/drivers/cpuidle/cpuidle-riscv-sbi.c b/drivers/cpuidle/cpuidle-riscv-sbi.c
index be383f4b6855..04a601cda06b 100644
--- a/drivers/cpuidle/cpuidle-riscv-sbi.c
+++ b/drivers/cpuidle/cpuidle-riscv-sbi.c
@@ -100,10 +100,9 @@ static __cpuidle int sbi_cpuidle_enter_state(struct cpuidle_device *dev,
 	u32 state = states[idx];
 
 	if (state & SBI_HSM_SUSP_NON_RET_BIT)
-		return CPU_PM_CPU_IDLE_ENTER_PARAM(sbi_suspend, idx, state);
-	else
-		return CPU_PM_CPU_IDLE_ENTER_RETENTION_PARAM(sbi_suspend,
-							     idx, state);
+		return CPU_PM_CPU_IDLE_ENTER_PARAM_RCU(sbi_suspend, idx, state);
+
+	return CPU_PM_CPU_IDLE_ENTER_RETENTION_PARAM_RCU(sbi_suspend, idx, state);
 }
 
 static __cpuidle int __sbi_enter_domain_idle_state(struct cpuidle_device *dev,
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index 3183aeb7f5b4..dd92bdafe2d3 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -334,6 +334,9 @@ extern s64 cpuidle_governor_latency_req(unsigned int cpu);
 #define CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx)	\
 	__CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, idx, 0, 0)
 
+#define CPU_PM_CPU_IDLE_ENTER_RCU(low_level_idle_enter, idx)	\
+	__CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, idx, 0, 1)
+
 #define CPU_PM_CPU_IDLE_ENTER_RETENTION(low_level_idle_enter, idx)	\
 	__CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, idx, 1, 0)
 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ