linux-kernel - Re: [PATCH] cpu/hotplug: disallow writing any state in atomic AP section to sysfs target

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <Z5Oy41fzm8PHMoDy@linux.ibm.com>
Date: Fri, 24 Jan 2025 21:03:55 +0530
From: Vishal Chourasia <vishalc@...ux.ibm.com>
To: Koichiro Den <koichiro.den@...onical.com>
Cc: linux-kernel@...r.kernel.org, tglx@...utronix.de, peterz@...radead.org
Subject: Re: [PATCH] cpu/hotplug: disallow writing any state in atomic AP
 section to sysfs target

On Fri, Dec 20, 2024 at 11:15:38PM +0900, Koichiro Den wrote:
> When CONFIG_CPU_HOTPLUG_STATE_CONTROL=y, writing a state within the
> atomic AP section to 'hotplug/target' file for a fully online cpu can
> cause a kernel crash [1]. This occurs because take_cpu_down() disables
> the CPU, but the state machine does not reach CPUHP_AP_OFFLINE. As a
> result, when cpu stopper thread finishes its work and idle task takes
> over, cpuhp_report_idle_dead() crashes on 'BUG_ON(st->state !=
> CPUHP_AP_OFFLINE)'.
> 
> In the opposite direction, start_secondary() assumes all startup
> callbacks have been invoked and transitions to CPUHP_AP_ONLINE_IDLE,
> regardless of the written target. This can result in some callbacks in
> the section being silently skipped.
> 
> To address the issue, disable writing any state within the atomic AP
> states to sysfs target. Additionally, set cant_stop to true for both
> CPUHP_BP_KICK_AP (when CONFIG_HOTPLUG_SPLIT_STARTUP=y) and
> CPUHP_AP_ONLINE since we do not automatically make the state machine
> proceed to the other end of the atomic states.
> 
> [1]:
> 
>   # grep 'tick:dying' /sys/devices/system/cpu/hotplug/states
>     143: tick:dying
>   # cat /sys/devices/system/cpu/cpu7/hotplug/target
>     238  # fully online
>   # echo 143 > /sys/devices/system/cpu/cpu7/hotplug/target
After applying the patch, I don't see any system crash. 

Thank you for the patch!

Tested-By: Vishal Chourasia <vishalc@...ux.ibm.com>
> 
>     [  145.091832] ------------[ cut here ]------------
>     [  145.092928] kernel BUG at kernel/cpu.c:1365!
>     [  145.093960] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
>     --(snip)--
> 
> Signed-off-by: Koichiro Den <koichiro.den@...onical.com>
> ---
> Previous attempt:
> https://lore.kernel.org/all/20241207144721.2828390-1-koichiro.den@canonical.com/
> ---
>  kernel/cpu.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index 34f1a09349fc..c877443f5888 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -2127,6 +2127,7 @@ static struct cpuhp_step cpuhp_hp_states[] = {
>  	[CPUHP_BP_KICK_AP] = {
>  		.name			= "cpu:kick_ap",
>  		.startup.single		= cpuhp_kick_ap_alive,
> +		.cant_stop		= true,
>  	},
>  
>  	/*
> @@ -2192,6 +2193,7 @@ static struct cpuhp_step cpuhp_hp_states[] = {
>  	 * state for synchronsization */
>  	[CPUHP_AP_ONLINE] = {
>  		.name			= "ap:online",
> +		.cant_stop		= true,
>  	},
>  	/*
>  	 * Handled on control processor until the plugged processor manages
> @@ -2759,7 +2761,8 @@ static ssize_t target_store(struct device *dev, struct device_attribute *attr,
>  		return ret;
>  
>  #ifdef CONFIG_CPU_HOTPLUG_STATE_CONTROL
> -	if (target < CPUHP_OFFLINE || target > CPUHP_ONLINE)
> +	if (target < CPUHP_OFFLINE || target > CPUHP_ONLINE ||
> +	    cpuhp_is_atomic_state(target))
>  		return -EINVAL;
>  #else
>  	if (target != CPUHP_OFFLINE && target != CPUHP_ONLINE)
pretty straight forward...

While I agree that this is one way to patch things up and, at the very
least, prevent a kernel crash, I wonder what infrastructure is needed to
make CONFIG_CPU_HOTPLUG_STATE_CONTROL=y work for the prepare phase.

IIUC:  
1. The control CPU invokes `takedown_cpu()`, which sets the new
target state as `CPUHP_AP_OFFLINE`.  

2. `take_cpu_down()` is executed on the AP in the stopper thread context
and starts invoking the prepare phase callbacks.  

3. Suppose the user wants to stop at some state in the
prepare phase; after executing that state's teardown callback, the
stopper thread returns control back to `takedown_cpu()` on the control
CPU.  

4. Here’s the part I am not sure about: On CPU 6, after the stopper
thread finishes, a context switch occurs, the idle thread starts
executing, and it encounters `BUG_ON(st->state != CPUHP_AP_OFFLINE)` in
`cpuhp_report_idle_dead()` → `do_idle()`.

So, the problem arises from the stopper thread going away in the middle
(depending on which step in the prepare phase was set as the target).
The idle thread starts next and expects that, if the AP is marked as
offline, its current cpuhp_state should be CPUHP_AP_OFFLINE. However, it
won't be, causing a kernel crash in cpuhp_report_idle_dead().


 vishalc
> -- 
> 2.43.0
>