linux-kernel - [REGRESSION] PM / sleep: Unbalanced suspend/resume on late abort causes data abort

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <70b25dca6f8c2756d78f076f4a7dee7edaaffc33.camel@mediatek.com>
Date: Mon, 17 Nov 2025 17:31:05 +0800
From: Rose Wu <ya-jou.wu@...iatek.com>
To: <rafael.j.wysocki@...el.com>, <linux-pm@...r.kernel.org>,
	<regressions@...ts.linux.dev>
CC: <saravanak@...gle.com>, <len.brown@...el.com>, <pavel@...nel.org>,
	<linux-kernel@...r.kernel.org>, wsd_upstream <wsd_upstream@...iatek.com>,
	<linux-mediatek@...ts.infradead.org>,
	士顏 邱 <artis.chiu@...iatek.com>,
	靖智 高 <Johnny-cc.Kao@...iatek.com>
Subject: [REGRESSION] PM / sleep: Unbalanced suspend/resume on late abort
 causes data abort

Hi Rafael and All,

I am reporting a regression introduced by the commit
443046d1ad66607f324c604b9fbdf11266fa8aad (PM: sleep: Make suspend of
devices more asynchronous), which can lead to a kernel panic (data
abort) if a late suspend aborts.
The commit modifies list handling during suspend. When a device suspend
aborts at the "late" stage, `dpm_suspended_list` is spliced into
`dpm_late_early_list`.
This creates an imbalance. Devices on this list that had not yet
executed `pm_runtime_disable()` in `device_suspend_late()` are now
incorrectly subjected to `pm_runtime_enable()` during the subsequent
`device_resume_early()` sequence.

This causes two issues:

1. Numerous error messages in dmesg: "Attempt to enable runtime PM when
it is blocked."
2. A critical failure for simple-bus devices: When
`simple_pm_bus_runtime_resume()` is called for a device whose bus is
`NULL`, the kernel attempts to access the null bus struct, triggering a
data abort.

Steps to Reproduce:

The issue can be reliably reproduced by forcing a late suspend to
abort.

1. Apply the following modification to the `device_suspend_late()`
function to simulate a wakeup event:
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -1568,7 +1568,7 @@ static int device_suspend_late(struct device
*dev, pm_message_t state, bool asyn
 	if (async_error)
 		goto Complete;

-	if (pm_wakeup_pending()) {
+	if (1) { /* Force abort for testing */
 		async_error = -EBUSY;
 		goto Complete;
 	}
2. Trigger a system suspend.
3. The system will attempt to suspend, abort at the late stage, and
then trigger the data abort during the resume sequence.

Call Trace:

Unable to handle kernel NULL pointer dereference at virtual address
0000000000000008
pc : [0xffffffe3988e81e4] simple_pm_bus_runtime_resume+0x1c/0x90
lr : [0xffffffe398a848d0] pm_generic_runtime_resume+0x40/0x58

As a potential fix, I am wondering if a conditional check is needed in
`device_resume_early()` before invoking `pm_runtime_enable()` for a
device?

Best Regards,
Rose