lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241207144721.2828390-1-koichiro.den@canonical.com>
Date: Sat,  7 Dec 2024 23:47:21 +0900
From: Koichiro Den <koichiro.den@...onical.com>
To: linux-kernel@...r.kernel.org
Cc: tglx@...utronix.de,
	peterz@...radead.org
Subject: [PATCH] cpu/hotplug: ensure the starting section runs fully regardless of target

When CONFIG_CPU_HOTPLUG_STATE_CONTROL=y, writing a state within the
STARTING section to 'hotplug/target' file for a fully online cpu can
cause a kernel crash [1]. This occurs because take_cpu_down() disables
the CPU, but the state machine does not reach CPUHP_AP_OFFLINE. As a
result, when cpu stopper thread finishes its work and idle task takes
over, cpuhp_report_idle_dead() crashes on 'BUG_ON(st->state !=
CPUHP_AP_OFFLINE)'.

In the opposite direction, start_secondary() assumes all startup
callbacks have been invoked and transitions to CPUHP_AP_ONLINE_IDLE,
regardless of the written target. This can result in some callbacks in
the section being silently skipped.

Callbacks in STARTING section must not fail in any case and seem
expected to be executed in one continuous sequence. So, modify both
take_cpu_down() and notify_cpu_starting() to ignore st->target and fully
traverse the STARTING section to its appropriate end state. This
resolves the issue and ensures symmetric behavior for both directions.

[1]: example of reproduction steps:

  # grep 'tick:dying' /sys/devices/system/cpu/hotplug/states
    143: tick:dying # whatever in the middle of the section can be used.
                    # (st->cant_stop needs to be false)
  # cat /sys/devices/system/cpu/cpu7/hotplug/target
    238             # fully online
  # echo 143 > /sys/devices/system/cpu/cpu7/hotplug/target

    [  145.091832] ------------[ cut here ]------------
    [  145.092928] kernel BUG at kernel/cpu.c:1365!
    [  145.093960] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
    --(snip)--

  With this patch, the crash no longer occurs and the state transitions
  to the opposite end of the section.

  # echo 143 > /sys/devices/system/cpu/cpu7/hotplug/target
  # cat /sys/devices/system/cpu/cpu7/hotplug/state
    88              # "idle:dead"

Signed-off-by: Koichiro Den <koichiro.den@...onical.com>
---
 kernel/cpu.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 85fd7ac4561e..34749792b37e 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1269,7 +1269,6 @@ void clear_tasks_mm_cpumask(int cpu)
 static int take_cpu_down(void *_param)
 {
 	struct cpuhp_cpu_state *st = this_cpu_ptr(&cpuhp_state);
-	enum cpuhp_state target = max((int)st->target, CPUHP_AP_OFFLINE);
 	int err, cpu = smp_processor_id();
 
 	/* Ensure this CPU doesn't handle any more interrupts. */
@@ -1285,8 +1284,9 @@ static int take_cpu_down(void *_param)
 
 	/*
 	 * Invoke the former CPU_DYING callbacks. DYING must not fail!
+	 * Regardless of st->target, it must run through to CPUHP_AP_OFFLINE.
 	 */
-	cpuhp_invoke_callback_range_nofail(false, cpu, st, target);
+	cpuhp_invoke_callback_range_nofail(false, cpu, st, CPUHP_AP_OFFLINE);
 
 	/* Park the stopper thread */
 	stop_machine_park(cpu);
@@ -1593,15 +1593,15 @@ void smp_shutdown_nonboot_cpus(unsigned int primary_cpu)
 void notify_cpu_starting(unsigned int cpu)
 {
 	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
-	enum cpuhp_state target = min((int)st->target, CPUHP_AP_ONLINE);
 
 	rcutree_report_cpu_starting(cpu);	/* Enables RCU usage on this CPU. */
 	cpumask_set_cpu(cpu, &cpus_booted_once_mask);
 
 	/*
 	 * STARTING must not fail!
+	 * Regardless of st->target, it must run through to CPUHP_AP_ONLINE.
 	 */
-	cpuhp_invoke_callback_range_nofail(true, cpu, st, target);
+	cpuhp_invoke_callback_range_nofail(true, cpu, st, CPUHP_AP_ONLINE);
 }
 
 /*
-- 
2.43.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ