[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <70b25dca6f8c2756d78f076f4a7dee7edaaffc33.camel@mediatek.com>
Date: Mon, 17 Nov 2025 17:31:05 +0800
From: Rose Wu <ya-jou.wu@...iatek.com>
To: <rafael.j.wysocki@...el.com>, <linux-pm@...r.kernel.org>,
<regressions@...ts.linux.dev>
CC: <saravanak@...gle.com>, <len.brown@...el.com>, <pavel@...nel.org>,
<linux-kernel@...r.kernel.org>, wsd_upstream <wsd_upstream@...iatek.com>,
<linux-mediatek@...ts.infradead.org>,
士顏 邱 <artis.chiu@...iatek.com>,
靖智 高 <Johnny-cc.Kao@...iatek.com>
Subject: [REGRESSION] PM / sleep: Unbalanced suspend/resume on late abort
causes data abort
Hi Rafael and All,
I am reporting a regression introduced by the commit
443046d1ad66607f324c604b9fbdf11266fa8aad (PM: sleep: Make suspend of
devices more asynchronous), which can lead to a kernel panic (data
abort) if a late suspend aborts.
The commit modifies list handling during suspend. When a device suspend
aborts at the "late" stage, `dpm_suspended_list` is spliced into
`dpm_late_early_list`.
This creates an imbalance. Devices on this list that had not yet
executed `pm_runtime_disable()` in `device_suspend_late()` are now
incorrectly subjected to `pm_runtime_enable()` during the subsequent
`device_resume_early()` sequence.
This causes two issues:
1. Numerous error messages in dmesg: "Attempt to enable runtime PM when
it is blocked."
2. A critical failure for simple-bus devices: When
`simple_pm_bus_runtime_resume()` is called for a device whose bus is
`NULL`, the kernel attempts to access the null bus struct, triggering a
data abort.
Steps to Reproduce:
The issue can be reliably reproduced by forcing a late suspend to
abort.
1. Apply the following modification to the `device_suspend_late()`
function to simulate a wakeup event:
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -1568,7 +1568,7 @@ static int device_suspend_late(struct device
*dev, pm_message_t state, bool asyn
if (async_error)
goto Complete;
- if (pm_wakeup_pending()) {
+ if (1) { /* Force abort for testing */
async_error = -EBUSY;
goto Complete;
}
2. Trigger a system suspend.
3. The system will attempt to suspend, abort at the late stage, and
then trigger the data abort during the resume sequence.
Call Trace:
Unable to handle kernel NULL pointer dereference at virtual address
0000000000000008
pc : [0xffffffe3988e81e4] simple_pm_bus_runtime_resume+0x1c/0x90
lr : [0xffffffe398a848d0] pm_generic_runtime_resume+0x40/0x58
As a potential fix, I am wondering if a conditional check is needed in
`device_resume_early()` before invoking `pm_runtime_enable()` for a
device?
Best Regards,
Rose
Powered by blists - more mailing lists