linux-kernel - Re: [PATCH v2 2/3] PM: sleep: Suspend parents and suppliers after suspending subordinates

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGETcx8WwG96FEOyKR-qFA=S6GhpH-EKpVtghNtxt-CQ-3UB_g@mail.gmail.com>
Date: Thu, 13 Mar 2025 14:16:48 -0700
From: Saravana Kannan <saravanak@...gle.com>
To: "Rafael J. Wysocki" <rjw@...ysocki.net>
Cc: Linux PM <linux-pm@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>, 
	Alan Stern <stern@...land.harvard.edu>, Ulf Hansson <ulf.hansson@...aro.org>, 
	Johan Hovold <johan@...nel.org>, Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>, 
	Jon Hunter <jonathanh@...dia.com>
Subject: Re: [PATCH v2 2/3] PM: sleep: Suspend parents and suppliers after
 suspending subordinates

On Thu, Mar 13, 2025 at 1:35 PM Rafael J. Wysocki <rjw@...ysocki.net> wrote:
>
> From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
>
> In analogy with the previous change affecting the resume path,
> make device_suspend() start the async suspend of the device's parent
> and suppliers after the device itself has been processed and make
> dpm_suspend() start processing "async" leaf devices (that is, devices
> without children or consumers) upfront because they don't need to wait
> for any other devices.
>
> On the Dell XPS13 9360 in my office, this change reduces the total
> duration of device suspend by approximately 100 ms (over 20%).
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> Suggested-by: Saravana Kannan <saravanak@...gle.com>
> ---
>
> v1 -> v2:
>    * Adjust for the changes in patch [1/3].
>    * Fix walking suppliers in dpm_async_suspend_superior().
>    * Use device links read locking in dpm_async_suspend_superior() (Saravana).
>    * Move all devices to the target list even if there are errors in
>      dpm_suspend() so they are properly resumed during rollback (Saravana).
>
> ---
>  drivers/base/power/main.c |   78 ++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 72 insertions(+), 6 deletions(-)
>
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -1231,6 +1231,50 @@
>
>  /*------------------------- Suspend routines -------------------------*/
>
> +static bool dpm_leaf_device(struct device *dev)
> +{
> +       struct device *child;
> +
> +       lockdep_assert_held(&dpm_list_mtx);
> +
> +       child = device_find_any_child(dev);
> +       if (child) {
> +               put_device(child);
> +
> +               return false;
> +       }
> +
> +       /*
> +        * Since this function is required to run under dpm_list_mtx, the
> +        * list_empty() below will only return true if the device's list of
> +        * consumers is actually empty before calling it.
> +        */
> +       return list_empty(&dev->links.consumers);
> +}

We need the equivalent of this for resume.

> +
> +static void dpm_async_suspend_superior(struct device *dev, async_func_t func)
> +{
> +       struct device_link *link;
> +       int idx;
> +
> +       mutex_lock(&dpm_list_mtx);
> +
> +       /* Start processing the device's parent if it is "async". */
> +       if (dev->parent)
> +               dpm_async_with_cleanup(dev->parent, func);
> +
> +       mutex_unlock(&dpm_list_mtx);
> +
> +       idx = device_links_read_lock();
> +
> +       /* Start processing the device's "async" suppliers. */
> +       list_for_each_entry_rcu(link, &dev->links.suppliers, c_node)
> +               if (READ_ONCE(link->status) != DL_STATE_DORMANT)
> +                       dpm_async_with_cleanup(link->supplier, func);

We should check that the rest of the consumers of the supplier are
"done" before we queue the supplier. With 386 device links (and the
number only increases as we add support for more properties), there's
no doubt that we'll hit this often.

> +
> +       device_links_read_unlock(idx);

Is passing idx to unlock a new (within the past 6 months) thing? I
don't remember having to do this in the past.

> +}
> +
>  /**
>   * resume_event - Return a "resume" message for given "suspend" sleep state.
>   * @sleep_state: PM message representing a sleep state.
> @@ -1656,6 +1700,8 @@
>         device_links_read_unlock(idx);
>  }
>
> +static void async_suspend(void *data, async_cookie_t cookie);
> +
>  /**
>   * device_suspend - Execute "suspend" callbacks for given device.
>   * @dev: Device to handle.
> @@ -1785,7 +1831,13 @@
>
>         complete_all(&dev->power.completion);
>         TRACE_SUSPEND(error);
> -       return error;
> +
> +       if (error || async_error)
> +               return error;
> +
> +       dpm_async_suspend_superior(dev, async_suspend);
> +
> +       return 0;
>  }
>
>  static void async_suspend(void *data, async_cookie_t cookie)
> @@ -1803,6 +1855,7 @@
>  int dpm_suspend(pm_message_t state)
>  {
>         ktime_t starttime = ktime_get();
> +       struct device *dev;
>         int error = 0;
>
>         trace_suspend_resume(TPS("dpm_suspend"), state.event, true);
> @@ -1816,12 +1869,28 @@
>
>         mutex_lock(&dpm_list_mtx);
>
> +       /*
> +        * Start processing "async" leaf devices upfront because they don't need
> +        * to wait.
> +        */
> +       list_for_each_entry_reverse(dev, &dpm_prepared_list, power.entry) {
> +               dpm_clear_async_state(dev);
> +               if (dpm_leaf_device(dev))
> +                       dpm_async_with_cleanup(dev, async_suspend);
> +       }
> +
>         while (!list_empty(&dpm_prepared_list)) {
> -               struct device *dev = to_device(dpm_prepared_list.prev);
> +               dev = to_device(dpm_prepared_list.prev);
>
>                 list_move(&dev->power.entry, &dpm_suspended_list);
>
> -               dpm_clear_async_state(dev);
> +               /*
> +                * Move all devices to the target list to resume them properly
> +                * on errors.
> +                */

I did this initially on my end, but we have so many devices that
looping through them had a measurable impact. It's better to just
splice the lists on error.

-Saravana

> +               if (error || async_error)
> +                       continue;
> +
>                 if (dpm_async_fn(dev, async_suspend))
>                         continue;
>
> @@ -1834,9 +1903,6 @@
>                 put_device(dev);
>
>                 mutex_lock(&dpm_list_mtx);
> -
> -               if (error || async_error)
> -                       break;
>         }
>
>         mutex_unlock(&dpm_list_mtx);
>
>
>