[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJZ5v0jWTtaQEcx0p+onU3eujgAJpF_V57wzZCuYv2NVnEb7VQ@mail.gmail.com>
Date: Fri, 2 May 2025 22:33:12 +0200
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Jon Hunter <jonathanh@...dia.com>
Cc: "Rafael J. Wysocki" <rjw@...ysocki.net>, Linux PM <linux-pm@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>, Alan Stern <stern@...land.harvard.edu>,
Ulf Hansson <ulf.hansson@...aro.org>, Johan Hovold <johan@...nel.org>,
Manivannan Sadhasivam <manivannan.sadhasivam@...aro.org>, Saravana Kannan <saravanak@...gle.com>,
"linux-tegra@...r.kernel.org" <linux-tegra@...r.kernel.org>
Subject: Re: [PATCH v3 1/5] PM: sleep: Resume children after resuming the parent
Hi Jon,
On Thu, May 1, 2025 at 11:51 AM Jon Hunter <jonathanh@...dia.com> wrote:
>
> Hi Rafael,
>
> On 14/03/2025 12:50, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> >
> > According to [1], the handling of device suspend and resume, and
> > particularly the latter, involves unnecessary overhead related to
> > starting new async work items for devices that cannot make progress
> > right away because they have to wait for other devices.
> >
> > To reduce this problem in the resume path, use the observation that
> > starting the async resume of the children of a device after resuming
> > the parent is likely to produce less scheduling and memory management
> > noise than starting it upfront while at the same time it should not
> > increase the resume duration substantially.
> >
> > Accordingly, modify the code to start the async resume of the device's
> > children when the processing of the parent has been completed in each
> > stage of device resume and only start async resume upfront for devices
> > without parents.
> >
> > Also make it check if a given device can be resumed asynchronously
> > before starting the synchronous resume of it in case it will have to
> > wait for another that is already resuming asynchronously.
> >
> > In addition to making the async resume of devices more friendly to
> > systems with relatively less computing resources, this change is also
> > preliminary for analogous changes in the suspend path.
> >
> > On the systems where it has been tested, this change by itself does
> > not affect the overall system resume duration in a measurable way.
> >
> > Link: https://lore.kernel.org/linux-pm/20241114220921.2529905-1-saravanak@google.com/ [1]
> > Suggested-by: Saravana Kannan <saravanak@...gle.com>
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
>
>
> I have noticed a suspend regression with -next on a couple of our Tegra
> boards. Bisect was pointing to the following merge commit ...
>
> # first bad commit: [218a7bbf861f83398ac9767620e91983e36eac05] Merge
> branch 'pm-sleep' into linux-next
>
> On top of next-20250429 I found that by reverting the following changes
> that suspend is working again ...
>
> Revert "PM: sleep: Resume children after resuming the parent"
> Revert "PM: sleep: Suspend async parents after suspending children"
> Revert "PM: sleep: Make suspend of devices more asynchronous"
I see.
Do all three commits need to be reverted to make things work again?
The first one only touches the resume path, so it would be surprising
if it caused a suspend regression to occur.
The most likely commit to cause this issue to happen is the second one
because it effectively changes the suspend ordering for "async"
devices.
> I have been looking into this a bit more to see what device is failing
> and by adding a bit of debug I found that entry to suspend was failing
> on the Tegra194 Jetson AGX Xavier (tegra194-p2972-0000.dts) platform
> when one of the I2C controllers (i2c@...0000) was being suspended.
>
> I found that if I disable only this I2C controller in device-tree
> suspend worked again on top of -next. This I2C controller has 3 devices
> on the platform; two ina3221 devices and one Cypress Type-C controller.
> I then found that removing only the two ina3221 devices (in
> tegra194-p2888.dtsi) also allows suspend to work.
>
> At this point, I am still unclear why this is now failing. If you have
> any thoughts or things I can try please let me know.
So are the devices in question "async"? To check this, please see the
"async" attribute in the "power" subdirectory of the sysfs device
directory for each of them.
If they are "async", you can write "disable" to this attribute to turn
them into "sync" devices. I'd do this and see what happens.
Overall, it looks like some dependencies aren't properly represented
by device links on this platform.
Thanks, Rafael
Powered by blists - more mailing lists