linux-kernel - Re: [PATCH v2 1/3] PM: sleep: Resume children after resuming the parent

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0j_exikLyS7MeZT75hD-8y1uwJdYQvDjwepCOmow506Zw@mail.gmail.com>
Date: Thu, 13 Mar 2025 23:29:41 +0100
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Saravana Kannan <saravanak@...gle.com>
Cc: "Rafael J. Wysocki" <rjw@...ysocki.net>, Linux PM <linux-pm@...r.kernel.org>, 
	LKML <linux-kernel@...r.kernel.org>, Alan Stern <stern@...land.harvard.edu>, 
	Ulf Hansson <ulf.hansson@...aro.org>, Johan Hovold <johan@...nel.org>, 
	Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>, Jon Hunter <jonathanh@...dia.com>
Subject: Re: [PATCH v2 1/3] PM: sleep: Resume children after resuming the parent

On Thu, Mar 13, 2025 at 10:17 PM Saravana Kannan <saravanak@...gle.com> wrote:
>
> On Thu, Mar 13, 2025 at 1:35 PM Rafael J. Wysocki <rjw@...ysocki.net> wrote:
> >
> > From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> >
> > According to [1], the handling of device suspend and resume, and
> > particularly the latter, involves unnecessary overhead related to
> > starting new async work items for devices that cannot make progress
> > right away because they have to wait for other devices.
> >
> > To reduce this problem in the resume path, use the observation that
> > starting the async resume of the children of a device after resuming
> > the parent is likely to produce less scheduling and memory management
> > noise than starting it upfront while at the same time it should not
> > increase the resume duration substantially.
> >
> > Accordingly, modify the code to start the async resume of the device's
> > children when the processing of the parent has been completed in each
> > stage of device resume and only start async resume upfront for devices
> > without parents.
> >
> > Also make it check if a given device can be resumed asynchronously
> > before starting the synchronous resume of it in case it will have to
> > wait for another that is already resuming asynchronously.
> >
> > In addition to making the async resume of devices more friendly to
> > systems with relatively less computing resources, this change is also
> > preliminary for analogous changes in the suspend path.
> >
> > On the systems where it has been tested, this change by itself does
> > not affect the overall system resume duration in a significant way.
> >
> > Link: https://lore.kernel.org/linux-pm/20241114220921.2529905-1-saravanak@google.com/ [1]
> > Suggested-by: Saravana Kannan <saravanak@...gle.com>
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> > ---
> >
> > v1 -> v2:
> >    Use a separate lock for power.work_in_progress protection which should
> >    reduce lock contention on dpm_list_mtx.
> >
> > ---
> >  drivers/base/power/main.c |   80 +++++++++++++++++++++++++++++++++++-----------
> >  1 file changed, 61 insertions(+), 19 deletions(-)
> >
> > --- a/drivers/base/power/main.c
> > +++ b/drivers/base/power/main.c
> > @@ -63,6 +63,7 @@
> >  static DEFINE_MUTEX(dpm_list_mtx);
> >  static pm_message_t pm_transition;
> >
> > +static DEFINE_MUTEX(async_wip_mtx);
>
> I think (not sure) this can be a spinlock.

No, it can't, because async_schedule_dev_nocall() allocates memory
with GFP_KERNEL.

> >  static int async_error;
> >
> >  static const char *pm_verb(int event)
> > @@ -597,8 +598,11 @@
> >                 && !pm_trace_is_enabled();
> >  }
> >
> > -static bool dpm_async_fn(struct device *dev, async_func_t func)
> > +static bool __dpm_async(struct device *dev, async_func_t func)
> >  {
> > +       if (dev->power.work_in_progress)
> > +               return true;
> > +
> >         if (!is_async(dev))
> >                 return false;
> >
> > @@ -611,14 +615,37 @@
> >
> >         put_device(dev);
> >
> > +       return false;
> > +}
> > +
> > +static bool dpm_async_fn(struct device *dev, async_func_t func)
> > +{
> > +       guard(mutex)(&async_wip_mtx);
> > +
> > +       return __dpm_async(dev, func);
> > +}
> > +
> > +static int dpm_async_with_cleanup(struct device *dev, void *fn)
> > +{
> > +       guard(mutex)(&async_wip_mtx);
> > +
> > +       if (!__dpm_async(dev, fn))
> > +               dev->power.work_in_progress = false;
> > +
> > +       return 0;
> > +}
> > +
> > +static void dpm_async_resume_children(struct device *dev, async_func_t func)
> > +{
> >         /*
> > -        * async_schedule_dev_nocall() above has returned false, so func() is
> > -        * not running and it is safe to update power.work_in_progress without
> > -        * extra synchronization.
> > +        * Start processing "async" children of the device unless it's been
> > +        * started already for them.
> > +        *
> > +        * This could have been done for the device's "async" consumers too, but
> > +        * they either need to wait for their parents or the processing has
> > +        * already started for them after their parents were processed.
> >          */
> > -       dev->power.work_in_progress = false;
> > -
> > -       return false;
> > +       device_for_each_child(dev, func, dpm_async_with_cleanup);
>
> Continuing my comments from v1 here, it's not a good assumption to
> make that the child can start resuming just because the parent has
> finished resuming.

Well, there is one dependency less for it, so it makes sense to start
async processing for it because it may be ready.  It doesn't have to
be ready for sure, but if it's not ready, it'll wait.

I know that you want to avoid the waiting at all, but I'm not
convinced that this is necessary and even that this is the right
approach and I'm afraid the only way to convince me would be to
demonstrate it.

Explicit dependency tracking here would be extra overhead and
complexity and I'm not going to do it until I know that it is actually
worth the hassle.

> In my example, I have 386 device links and ~600
> devices that have some sort of suspend ops (I think the total device
> node count is ~1700).

However, some of those device links are between parents and children
IIRC.  If so, then how many?

> I'm even more confused by why you think resume needs to be asymmetric
> with suspend. In suspend, you kick off all the suppliers too when a
> device is done suspending, but in resume you don't kick off all the
> consumers.

This is based on the assumption that the majority of devices will have
a parent and since the diameter of the graph is less than the number
of devices, there are devices with multiple children whereas each
device can only have one parent.

Quite likely there are also devices with multiple consumers, but how
many devices with multiple suppliers are there?

> >  }
> >
> >  static void dpm_clear_async_state(struct device *dev)
> > @@ -627,6 +654,8 @@
> >         dev->power.work_in_progress = false;
> >  }
> >
> > +static void async_resume_noirq(void *data, async_cookie_t cookie);
> > +
> >  /**
> >   * device_resume_noirq - Execute a "noirq resume" callback for given device.
> >   * @dev: Device to handle.
> > @@ -710,6 +739,8 @@
> >                 dpm_save_failed_dev(dev_name(dev));
> >                 pm_dev_err(dev, state, async ? " async noirq" : " noirq", error);
> >         }
> > +
> > +       dpm_async_resume_children(dev, async_resume_noirq);
> >  }
> >
> >  static void async_resume_noirq(void *data, async_cookie_t cookie)
> > @@ -733,19 +764,20 @@
> >         mutex_lock(&dpm_list_mtx);
> >
> >         /*
> > -        * Trigger the resume of "async" devices upfront so they don't have to
> > -        * wait for the "non-async" ones they don't depend on.
> > +        * Start processing "async" devices without parents upfront so they
> > +        * don't wait for the "sync" devices they don't depend on.
> >          */
> >         list_for_each_entry(dev, &dpm_noirq_list, power.entry) {
> >                 dpm_clear_async_state(dev);
> > -               dpm_async_fn(dev, async_resume_noirq);
> > +               if (!dev->parent)
>
> This check isn't sufficient. There are plenty of devices where they
> have no parent, but they have many suppliers.

Again, how many?

> That's the reality in the DT world. And when you do deeper down the tree
> (dpm_async_resume_children), the children typically have more
> suppliers.

The numbers you gave above don't seem to support this, though, and
even so, by the time the parent has resumed, it is quite likely that
the suppliers have resumed either.

> You can also check for "no suppliers" to find the true leaf nodes and
> start with them, but that means you also have to kick off the
> consumers when you finish your resume.

That can be done, but how much of a difference will it really make?
You are saying "a lot", but this is not really quantitative.

> We definitely need to check
> device links for this patchset to be useful for me. With my patch
> series, it's effectively just NCPU kworkers running continuously on
> each CPU. Won't be the case if we don't check the device links. And as
> I said before, the overhead isn't just about context switches, but
> also forking more kworker threads.

What about comparing this series with your series on the system in
question and sharing the numbers?  You could easily get a stake in the
ground this way.

Cheers, Rafael