linux-kernel - [RFC PATCH] exit: Skip panic in do

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250410143937.1829272-1-Tze-nan.Wu@mediatek.com>
Date: Thu, 10 Apr 2025 22:39:37 +0800
From: Tze-nan Wu <Tze-nan.Wu@...iatek.com>
To: Oleg Nesterov <oleg@...hat.com>, Christian Brauner <brauner@...nel.org>
CC: Andrew Morton <akpm@...ux-foundation.org>, <wsd_upstream@...iatek.com>,
	<bobule.chang@...iatek.com>, Tze-nan Wu <Tze-nan.Wu@...iatek.com>, Matthias
 Brugger <matthias.bgg@...il.com>, AngeloGioacchino Del Regno
	<angelogioacchino.delregno@...labora.com>, chenqiwu <chenqiwu@...omi.com>,
	<linux-kernel@...r.kernel.org>, <linux-arm-kernel@...ts.infradead.org>,
	<linux-mediatek@...ts.infradead.org>
Subject: [RFC PATCH] exit: Skip panic in do_exit() during poweroff

When kernel_power_off() is invoked by a process other than the global
init (PID 1) on a specific core, other CPUs are still allowed to execute
processes, even though the userspace becomes unreliable.

If PID 1 exits due to the unreliable userspace after kernel_power_off()
invoked, the panic follow by the last thread of global init exited in
do_exit() will stop the kernel_power_off() procedure, turn a shutdown
behavior into panic flow(reboot).

Add a condition check to ensure that the panic triggered by the last
thread of the global init exiting, only occurs while:
( system_state != SYSTEM_POWER_OFF and system_state != SYSTEM_RESTART).
Otherwise, WARN() instead.

[On Android 16 with arm64 arch]
Here's a scenario where the global init exits during kernel_power_off:
If PID 1 encounters a page fault after kernel_power_off() has been
invoked, the kernel will fail to handle the page fault because the
disk(UFS) has already shut down.
Consequently, the kernel will send a SIGBUS to PID 1 to indicate the
page fault failure, and ultimately, the panic will occur after PID 1
exits due to receiving the SIGBUS.

            cpu1                           cpu2
          ----------                     ----------
    kernel_power_off() start
        UFS shutdown
            ...                	       PID 1 page fault
            ...                    page fault handle failure
            ...			             PID 1 received SIGBUS
            ...                             panic
   kernel_power_off() not done

Backtrace while PID 1 received signal 7:
   init-1 [007] d..1 41239.922385: \
      signal_generate: sig=7 errno=0 code=2 comm=init pid=1 grp=0 res=0
   init-1 [007] d..1 41239.922389: kernel_stack: <stack trace>
   => __send_signal_locked
   => send_signal_locked
   => force_sig_info_to_task
   => force_sig_fault
   => arm64_force_sig_fault
   => do_page_fault
   => do_translation_fault
   => do_mem_abort
   => el0_ia
   => el0t_64_sync_handler

Simplified kernel log:
kernel_power_off() invoked by pt_notify_thread.
[41239.526109] pt_notify_threa: reboot set flag, old value 0x********,
*.
[41239.526114] pt_notify_threa: reboot set flag new value 0x********.
UFS reject I/O after kerenl_power_off.
[41239.686411]  scsi +scsi******** apexd: sd* ******** rejecting I/O to
offline device.
Lots of I/O error & erofs error happened after kernel_power_off().
[41239.690312] apexd: I/O error, dev sdc, sector ******* op ***:(READ)
flags 0x**** phys_seg ** prio class 0.
[41239.690465] apexd: I/O error, dev sdc, sector ******* op ***:(READ)
flags 0x**** phys_seg ** prio class 0.
...
...
[41239.922265] init: erofs: (device ****): z_erofs_read_folio: read
error * @ *** of nid ********.
[41239.922341] init: erofs: (device ****): z_erofs_read_folio: read
error * @ *** of nid ********.
Finally device panic due to PID 1 received SIGBUS.
[41239.923789] init: Kernel panic - not syncing: Attempted to kill init!
exitcode=0x00000007

Fixes: 43cf75d96409 ("exit: panic before exit_mm() on global init exit")
Link: https://lore.kernel.org/all/20191219104223.xvk6ppfogoxrgmw6@wittgenstein/
Signed-off-by: Tze-nan Wu <Tze-nan.Wu@...iatek.com>
---

I am also wondering if this patch is reasonable?

>From my perspective, there are two reasons not to trigger such panic
during kernel_power_off() or kernel_restart():
  1. It is not worthwhile to interrupt kernel_power_off() by a panic
     resulted from userspace instability.
  2. The panic in do_exit() was originally designed to ensure a usable
     coredump if the last thread of the global init process exited.
	 However, capture a coredump triggered by userspace crash after
     kernel_power_off() seems not particularly useful, in my opinion.
	 
In certain scenarios, a kernel module may need to directly power off
from kernel space to protect hardware (e.g., thermal protection).
In my opinion, rather than causing a panic during kernel_power_off(),
it sounds better to allow the device to complete its power-off process.

Appreciate for any comment on this, if there's any better way to
handle this panic, please point me out.

---
 kernel/exit.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/kernel/exit.c b/kernel/exit.c
index 1dcddfe537ee..23cb6b42a1f1 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -901,11 +901,17 @@ void __noreturn do_exit(long code)
 	if (group_dead) {
 		/*
 		 * If the last thread of global init has exited, panic
-		 * immediately to get a useable coredump.
+		 * immediately to get a usable coredump, except when the
+		 * device is currently powering off or restarting.
 		 */
-		if (unlikely(is_global_init(tsk)))
-			panic("Attempted to kill init! exitcode=0x%08x\n",
-				tsk->signal->group_exit_code ?: (int)code);
+		if (unlikely(is_global_init(tsk))) {
+			if (system_state != SYSTEM_POWER_OFF &&
+			    system_state != SYSTEM_RESTART)
+				panic("Attempted to kill init! exitcode=0x%08x\n",
+				      tsk->signal->group_exit_code ?: (int)code);
+			WARN(1, "Attempted to kill init! exitcode=0x%08x\n",
+			     tsk->signal->group_exit_code ?: (int)code);
+		}
 
 #ifdef CONFIG_POSIX_TIMERS
 		hrtimer_cancel(&tsk->signal->real_timer);
-- 
2.45.2