lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <13779172.uLZWGnKmhe@rjwysocki.net>
Date: Tue, 03 Jun 2025 18:21:57 +0200
From: "Rafael J. Wysocki" <rjw@...ysocki.net>
To: Linux PM <linux-pm@...r.kernel.org>
Cc: LKML <linux-kernel@...r.kernel.org>,
 Mario Limonciello <mario.limonciello@....com>,
 Chris Bainbridge <chris.bainbridge@...il.com>,
 Ulf Hansson <ulf.hansson@...aro.org>, Saravana Kannan <saravanak@...gle.com>,
 Sudeep Holla <sudeep.holla@....com>
Subject: [PATCH v1 3/3] PM: sleep: Add locking to dpm_async_resume_children()

From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>

Commit 0cbef962ce1f ("PM: sleep: Resume children after resuming the
parent") introduced a subtle concurrency issue that may lead to a kernel
crash if system suspend is aborted and may also slow down asynchronous
device resume otherwise.

Namely, the initial list walks in dpm_noirq_resume_devices(),
dpm_resume_early(), and dpm_resume() call dpm_clear_async_state() for
every device and attepmt to asynchronously resume it if it has no
children (so it is a "root" device).  The asynchronous resume of a
root device triggers an attempt to asynchronously resume its children
which may take place before calling dpm_clear_async_state() for them
due to the lack of synchronization between dpm_async_resume_children()
and the code calling dpm_clear_async_state().  If this happens, the
dpm_clear_async_state() that comes in late, will clear
power.work_in_progress for the given device after it has been set by
__dpm_async(), so the suspend callback will be allowed to run once
again for the same device during the same transition.  This leads to
a whole range of interesting breakage.

Fortunately, if the suspend transition is not aborted, power.work_in_progress
is set by it for all devices, so dpm_async_resume_children() will not
schedule asynchronous resume for them until dpm_clear_async_state()
clears that flag, but this means missing an opportunity to start the
resume of those devices earlier.

Address the above issue by adding dpm_list_mtx locking to
dpm_async_resume_children(), so it will wait for the entire initial
list walk and the invocation of dpm_clear_async_state() for all devices
to be completed before scheduling any new asynchronous resume callbacks.

Fixes: 0cbef962ce1f ("PM: sleep: Resume children after resuming the parent")
Reported-and-tested-by: Chris Bainbridge <chris.bainbridge@...il.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
---
 drivers/base/power/main.c |    7 +++++++
 1 file changed, 7 insertions(+)

--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -638,6 +638,13 @@
 static void dpm_async_resume_children(struct device *dev, async_func_t func)
 {
 	/*
+	 * Prevent racing with dpm_clear_async_state() during initial list
+	 * walks in dpm_noirq_resume_devices(), dpm_resume_early(), and
+	 * dpm_resume().
+	 */
+	guard(mutex)(&dpm_list_mtx);
+
+	/*
 	 * Start processing "async" children of the device unless it's been
 	 * started already for them.
 	 *




Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ