[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJZ5v0hmAm8hJ2tNoamREH_K+vKj7q_VwaqBicKbqmmDNCOUnQ@mail.gmail.com>
Date: Tue, 3 Feb 2026 21:06:12 +0100
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Xuewen Yan <xuewen.yan@...soc.com>
Cc: rafael@...nel.org, lenb@...nel.org, pavel@...nel.org,
gregkh@...uxfoundation.org, dakr@...nel.org, linux-pm@...r.kernel.org,
linux-kernel@...r.kernel.org, ke.wang@...soc.com, di.shen@...soc.com,
xuewen.yan94@...il.com
Subject: Re: [PATCH] PM: sleep: core: Fix sync issues between work_in_progress
and must_resume
On Tue, Feb 3, 2026 at 7:36 AM Xuewen Yan <xuewen.yan@...soc.com> wrote:
>
> There is a synchronization issue when suspend async:
>
> suspend-task: async-child-kworker
> dpm_noirq_suspend_devices
> mutex_lock(dpm_list_mtx) async_suspend_noirq
> list_for_each(dpm_late_early_list) device_suspend_noirq
> dpm_clear_async_state(parent); dpm_run_callback()
> reinit_completion() dpm_superior_set_must_resume(dev)
> parent->power.work_in_progress = false; dev->parent->power.must_resume = true;
Good catch!
The issue would affect suppliers too though, AFAICS.
> Because the power.work_in_progress and power.must_resume use the
> same byte:
> struct dev_pm_info {
> ....
> [56] struct completion completion;
> [104] struct wakeup_source *wakeup;
> [112] bool wakeup_path : 1;
> [112] bool syscore : 1;
> [112] bool no_pm_callbacks : 1;
> [112] bool work_in_progress : 1;
> [112] bool smart_suspend : 1;
> [112] bool must_resume : 1;
> [112] bool may_skip_resume : 1;
> [112] bool strict_midlayer : 1;
> ...
> }
>
> So, if suspend-task and child-kworker modify these two variables
> simultaneously, it will cause mutual overwriting issues.
> More severely, this may result in the work_in_progress variable
> not being set to false, preventing the __dpm_async() function from
> queuing work to execute the parent’s suspend function.
> Consequently, the completion event will never be finalized,
> ultimately causing the suspend process to be blocked.
>
> To resolve the aforementioned issue, the must_resume variable
> should be protected using dpm_list_mtx.
>
> Fixes: aa7a9275ab81 ("PM: sleep: Suspend async parents after suspending children")
No, the patch below doesn't fix this one AFAICS.
> Fixes: 443046d1ad66 ("PM: sleep: Make suspend of devices more asynchronous")
> Signed-off-by: Xuewen Yan <xuewen.yan@...soc.com>
> ---
> drivers/base/power/main.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> index 97a8b4fcf471..7ab42e065074 100644
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -1407,8 +1407,11 @@ static void dpm_superior_set_must_resume(struct device *dev)
> struct device_link *link;
> int idx;
>
> - if (dev->parent)
> + if (dev->parent) {
> + mutex_lock(&dpm_list_mtx);
> dev->parent->power.must_resume = true;
> + mutex_unlock(&dpm_list_mtx);
> + }
This is not sufficient because the suppliers can also be affected and
there are analogous issues in the other suspend phases.
>
> idx = device_links_read_lock();
>
> --
So let me send my version of the fix.
Thanks!
Powered by blists - more mailing lists