linux-kernel - Re: [PATCH v2 2/3] PM: sleep: Suspend parents and suppliers after suspending subordinates

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0j0kD_HjbVzHb7=5nEYx_5fzwoBupPBQs2Fpfzho_amfg@mail.gmail.com>
Date: Thu, 13 Mar 2025 23:36:04 +0100
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Saravana Kannan <saravanak@...gle.com>
Cc: "Rafael J. Wysocki" <rjw@...ysocki.net>, Linux PM <linux-pm@...r.kernel.org>, 
	LKML <linux-kernel@...r.kernel.org>, Alan Stern <stern@...land.harvard.edu>, 
	Ulf Hansson <ulf.hansson@...aro.org>, Johan Hovold <johan@...nel.org>, 
	Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>, Jon Hunter <jonathanh@...dia.com>
Subject: Re: [PATCH v2 2/3] PM: sleep: Suspend parents and suppliers after
 suspending subordinates

On Thu, Mar 13, 2025 at 10:17 PM Saravana Kannan <saravanak@...gle.com> wrote:
>
> On Thu, Mar 13, 2025 at 1:35 PM Rafael J. Wysocki <rjw@...ysocki.net> wrote:
> >
> > From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> >
> > In analogy with the previous change affecting the resume path,
> > make device_suspend() start the async suspend of the device's parent
> > and suppliers after the device itself has been processed and make
> > dpm_suspend() start processing "async" leaf devices (that is, devices
> > without children or consumers) upfront because they don't need to wait
> > for any other devices.
> >
> > On the Dell XPS13 9360 in my office, this change reduces the total
> > duration of device suspend by approximately 100 ms (over 20%).
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> > Suggested-by: Saravana Kannan <saravanak@...gle.com>
> > ---
> >
> > v1 -> v2:
> >    * Adjust for the changes in patch [1/3].
> >    * Fix walking suppliers in dpm_async_suspend_superior().
> >    * Use device links read locking in dpm_async_suspend_superior() (Saravana).
> >    * Move all devices to the target list even if there are errors in
> >      dpm_suspend() so they are properly resumed during rollback (Saravana).
> >
> > ---
> >  drivers/base/power/main.c |   78 ++++++++++++++++++++++++++++++++++++++++++----
> >  1 file changed, 72 insertions(+), 6 deletions(-)
> >
> > --- a/drivers/base/power/main.c
> > +++ b/drivers/base/power/main.c
> > @@ -1231,6 +1231,50 @@
> >
> >  /*------------------------- Suspend routines -------------------------*/
> >
> > +static bool dpm_leaf_device(struct device *dev)
> > +{
> > +       struct device *child;
> > +
> > +       lockdep_assert_held(&dpm_list_mtx);
> > +
> > +       child = device_find_any_child(dev);
> > +       if (child) {
> > +               put_device(child);
> > +
> > +               return false;
> > +       }
> > +
> > +       /*
> > +        * Since this function is required to run under dpm_list_mtx, the
> > +        * list_empty() below will only return true if the device's list of
> > +        * consumers is actually empty before calling it.
> > +        */
> > +       return list_empty(&dev->links.consumers);
> > +}
>
> We need the equivalent of this for resume.

Maybe.

> > +
> > +static void dpm_async_suspend_superior(struct device *dev, async_func_t func)
> > +{
> > +       struct device_link *link;
> > +       int idx;
> > +
> > +       mutex_lock(&dpm_list_mtx);
> > +
> > +       /* Start processing the device's parent if it is "async". */
> > +       if (dev->parent)
> > +               dpm_async_with_cleanup(dev->parent, func);
> > +
> > +       mutex_unlock(&dpm_list_mtx);
> > +
> > +       idx = device_links_read_lock();
> > +
> > +       /* Start processing the device's "async" suppliers. */
> > +       list_for_each_entry_rcu(link, &dev->links.suppliers, c_node)
> > +               if (READ_ONCE(link->status) != DL_STATE_DORMANT)
> > +                       dpm_async_with_cleanup(link->supplier, func);
>
> We should check that the rest of the consumers of the supplier are
> "done" before we queue the supplier. With 386 device links (and the
> number only increases as we add support for more properties), there's
> no doubt that we'll hit this often.

And I'm not doing this until I see any data confirming that it makes a
difference in the order of 10% or more.

> > +
> > +       device_links_read_unlock(idx);
>
> Is passing idx to unlock a new (within the past 6 months) thing? I
> don't remember having to do this in the past.

It's SRCU and it's been there forever.

> > +}
> > +
> >  /**
> >   * resume_event - Return a "resume" message for given "suspend" sleep state.
> >   * @sleep_state: PM message representing a sleep state.
> > @@ -1656,6 +1700,8 @@
> >         device_links_read_unlock(idx);
> >  }
> >
> > +static void async_suspend(void *data, async_cookie_t cookie);
> > +
> >  /**
> >   * device_suspend - Execute "suspend" callbacks for given device.
> >   * @dev: Device to handle.
> > @@ -1785,7 +1831,13 @@
> >
> >         complete_all(&dev->power.completion);
> >         TRACE_SUSPEND(error);
> > -       return error;
> > +
> > +       if (error || async_error)
> > +               return error;
> > +
> > +       dpm_async_suspend_superior(dev, async_suspend);
> > +
> > +       return 0;
> >  }
> >
> >  static void async_suspend(void *data, async_cookie_t cookie)
> > @@ -1803,6 +1855,7 @@
> >  int dpm_suspend(pm_message_t state)
> >  {
> >         ktime_t starttime = ktime_get();
> > +       struct device *dev;
> >         int error = 0;
> >
> >         trace_suspend_resume(TPS("dpm_suspend"), state.event, true);
> > @@ -1816,12 +1869,28 @@
> >
> >         mutex_lock(&dpm_list_mtx);
> >
> > +       /*
> > +        * Start processing "async" leaf devices upfront because they don't need
> > +        * to wait.
> > +        */
> > +       list_for_each_entry_reverse(dev, &dpm_prepared_list, power.entry) {
> > +               dpm_clear_async_state(dev);
> > +               if (dpm_leaf_device(dev))
> > +                       dpm_async_with_cleanup(dev, async_suspend);
> > +       }
> > +
> >         while (!list_empty(&dpm_prepared_list)) {
> > -               struct device *dev = to_device(dpm_prepared_list.prev);
> > +               dev = to_device(dpm_prepared_list.prev);
> >
> >                 list_move(&dev->power.entry, &dpm_suspended_list);
> >
> > -               dpm_clear_async_state(dev);
> > +               /*
> > +                * Move all devices to the target list to resume them properly
> > +                * on errors.
> > +                */
>
> I did this initially on my end, but we have so many devices that
> looping through them had a measurable impact.

Which I guess is super-important for error handling.  Come on.

> It's better to just splice the lists on error.

On top of this change, yes.