[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <13779172.uLZWGnKmhe@rjwysocki.net>
Date: Tue, 03 Jun 2025 18:21:57 +0200
From: "Rafael J. Wysocki" <rjw@...ysocki.net>
To: Linux PM <linux-pm@...r.kernel.org>
Cc: LKML <linux-kernel@...r.kernel.org>,
Mario Limonciello <mario.limonciello@....com>,
Chris Bainbridge <chris.bainbridge@...il.com>,
Ulf Hansson <ulf.hansson@...aro.org>, Saravana Kannan <saravanak@...gle.com>,
Sudeep Holla <sudeep.holla@....com>
Subject: [PATCH v1 3/3] PM: sleep: Add locking to dpm_async_resume_children()
From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
Commit 0cbef962ce1f ("PM: sleep: Resume children after resuming the
parent") introduced a subtle concurrency issue that may lead to a kernel
crash if system suspend is aborted and may also slow down asynchronous
device resume otherwise.
Namely, the initial list walks in dpm_noirq_resume_devices(),
dpm_resume_early(), and dpm_resume() call dpm_clear_async_state() for
every device and attepmt to asynchronously resume it if it has no
children (so it is a "root" device). The asynchronous resume of a
root device triggers an attempt to asynchronously resume its children
which may take place before calling dpm_clear_async_state() for them
due to the lack of synchronization between dpm_async_resume_children()
and the code calling dpm_clear_async_state(). If this happens, the
dpm_clear_async_state() that comes in late, will clear
power.work_in_progress for the given device after it has been set by
__dpm_async(), so the suspend callback will be allowed to run once
again for the same device during the same transition. This leads to
a whole range of interesting breakage.
Fortunately, if the suspend transition is not aborted, power.work_in_progress
is set by it for all devices, so dpm_async_resume_children() will not
schedule asynchronous resume for them until dpm_clear_async_state()
clears that flag, but this means missing an opportunity to start the
resume of those devices earlier.
Address the above issue by adding dpm_list_mtx locking to
dpm_async_resume_children(), so it will wait for the entire initial
list walk and the invocation of dpm_clear_async_state() for all devices
to be completed before scheduling any new asynchronous resume callbacks.
Fixes: 0cbef962ce1f ("PM: sleep: Resume children after resuming the parent")
Reported-and-tested-by: Chris Bainbridge <chris.bainbridge@...il.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
---
drivers/base/power/main.c | 7 +++++++
1 file changed, 7 insertions(+)
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -638,6 +638,13 @@
static void dpm_async_resume_children(struct device *dev, async_func_t func)
{
/*
+ * Prevent racing with dpm_clear_async_state() during initial list
+ * walks in dpm_noirq_resume_devices(), dpm_resume_early(), and
+ * dpm_resume().
+ */
+ guard(mutex)(&dpm_list_mtx);
+
+ /*
* Start processing "async" children of the device unless it's been
* started already for them.
*
Powered by blists - more mailing lists