lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20200106064155.64-1-chuansheng.liu@intel.com>
Date:   Mon,  6 Jan 2020 06:41:55 +0000
From:   Chuansheng Liu <chuansheng.liu@...el.com>
To:     linux-kernel@...r.kernel.org
Cc:     tony.luck@...el.com, bp@...en8.de, tglx@...utronix.de,
        mingo@...hat.com, hpa@...or.com, chuansheng.liu@...el.com
Subject: [PATCH] x86/mce/therm_throt: Fix the access of uninitialized therm_work

In ICL platform, it is easy to hit bootup failure with panic
in thermal interrupt handler during early bootup stage.

Such issue makes my platform almost can not boot up with
latest kernel code.

The call stack is like:
kernel BUG at kernel/timer/timer.c:1152!

Call Trace:
__queue_delayed_work
queue_delayed_work_on
therm_throt_process
intel_thermal_interrupt
...

When one CPU is up, the irq is enabled prior to CPU UP
notification which will then initialize therm_worker.
Such race will cause the posssibility that interrupt
handler therm_throt_process() accesses uninitialized
therm_work, then system hit panic at very early bootup
stage.

In my ICL platform, it can be reproduced in several times
of reboot stress. With this fix, the system keeps alive
for more than 200 times of reboot stress.

Signed-off-by: Chuansheng Liu <chuansheng.liu@...el.com>
---
 arch/x86/kernel/cpu/mce/therm_throt.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/therm_throt.c b/arch/x86/kernel/cpu/mce/therm_throt.c
index b38010b541d6..7320eb3ac029 100644
--- a/arch/x86/kernel/cpu/mce/therm_throt.c
+++ b/arch/x86/kernel/cpu/mce/therm_throt.c
@@ -86,6 +86,7 @@ struct _thermal_state {
 	unsigned long		total_time_ms;
 	bool			rate_control_active;
 	bool			new_event;
+	bool			therm_work_active;
 	u8			level;
 	u8			sample_index;
 	u8			sample_count;
@@ -359,7 +360,9 @@ static void therm_throt_process(bool new_event, int event, int level)
 
 		state->baseline_temp = temp;
 		state->last_interrupt_time = now;
-		schedule_delayed_work_on(this_cpu, &state->therm_work, THERM_THROT_POLL_INTERVAL);
+		if (state->therm_work_active)
+			schedule_delayed_work_on(this_cpu, &state->therm_work,
+					THERM_THROT_POLL_INTERVAL);
 	} else if (old_event && state->last_interrupt_time) {
 		unsigned long throttle_time;
 
@@ -473,7 +476,8 @@ static int thermal_throttle_online(unsigned int cpu)
 
 	INIT_DELAYED_WORK(&state->package_throttle.therm_work, throttle_active_work);
 	INIT_DELAYED_WORK(&state->core_throttle.therm_work, throttle_active_work);
-
+	state->package_throttle.therm_work_active = true;
+	state->core_throttle.therm_work_active = true;
 	return thermal_throttle_add_dev(dev, cpu);
 }
 
@@ -482,6 +486,8 @@ static int thermal_throttle_offline(unsigned int cpu)
 	struct thermal_state *state = &per_cpu(thermal_state, cpu);
 	struct device *dev = get_cpu_device(cpu);
 
+	state->package_throttle.therm_work_active = false;
+	state->core_throttle.therm_work_active = false;
 	cancel_delayed_work(&state->package_throttle.therm_work);
 	cancel_delayed_work(&state->core_throttle.therm_work);
 
-- 
2.17.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ