lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <PH7PR11MB635419A4AC11533F8F5791E493812@PH7PR11MB6354.namprd11.prod.outlook.com>
Date: Fri, 16 Aug 2024 08:33:39 +0000
From: "Chang, Kaiyen" <kaiyen.chang@...el.com>
To: "Rafael J. Wysocki" <rafael@...nel.org>
CC: "pavel@....cz" <pavel@....cz>, "Brown, Len" <len.brown@...el.com>,
	"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
	"linux-pm@...r.kernel.org" <linux-pm@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Todd Brandt
	<todd.e.brandt@...ux.intel.com>, "Lee, Chiasheng" <chiasheng.lee@...el.com>
Subject: RE: [PATCH v2 1/1] PM: Start asynchronous suspend threads upfront


On Tue, July 2, 2024 at 09:20:24PM +0200, Rafael J. Wysocki <rafael@...nel.org> wrote:
> 
> On Thu, Jun 20, 2024 at 11:21 AM Chang, Kaiyen <kaiyen.chang@...el.com>
> wrote:
> >
> > On Tue, Jun 18, 2024 at 07:27:18PM +0200, Rafael J. Wysocki
> <rafael@...nel.org> wrote:
> > >
> > > On Tue, Jun 18, 2024 at 2:13 PM Kaiyen Chang
> > > <kaiyen.chang@...el.com>
> > > wrote:
> > > >
> > > > Currently, when performing a suspend operation, all devices on the
> > > > dpm_list must wait for the "synchronous" devices that are listed
> > > > after them to complete before the main suspend thread can start
> > > > their suspend routines, even if they are "asynchronous". If the
> > > > suspend routine of a synchronous device must enter a waiting state
> > > > for some reason, it will cause the main suspend thread to go to
> > > > sleep, thereby delaying the processing of all subsequent devices,
> > > > including asynchronous ones, ultimately extending the overall suspend
> time.
> > > >
> > > > By starting the asynchronous suspend threads upfront, we can allow
> > > > the system to handle the suspend routines of these asynchronous
> > > > devices in parallel, without waiting for the suspend routines of
> > > > the synchronous devices listed after them to complete, and without
> > > > breaking their order with respect to their parents and children.
> > > > This way, even if the main suspend thread is blocked, these
> > > > asynchronous suspend threads can continue to run without being
> > > > affected.
> > > >
> > > > Signed-off-by: Kaiyen Chang <kaiyen.chang@...el.com>
> > >
> > > How exactly has this been tested?
> > >
> > > In the past, changes going in this direction caused system suspend
> > > to hang on at least some platforms (including the ones in my office).
> > >
> >
> > Hi Rafael, thanks for your reply.
> >
> > 1. We have currently verified this patch on the ADL-N Chromebook, running
> the suspend stress test 1000 times consecutively without encountering any
> issues.
> 
> Just one machine?  That's hardly sufficient.
> 

We have verified this patch on JSL, ADL-P, ADL-N, RPL, and MTL Chromebooks.
All devices can pass the suspend stress test 1000 times consecutively.

> > 2. Could you tell us what was the reason for suspend to hang back then?
> 
> It was what is still the case I think: Some dependencies between devices are
> implicitly taken into account when the async suspend threads are not started
> upfront.
> 
> > The reason why I believe that we can start the async device's suspend
> > threads upfront is that, during resume, a sync devices listed after
> > certain async devices on the list do not actually rely on the implicit
> > dependency provided by the order of the devices on the list.
> 
> This problem has nothing to do with resume, because the machines hang
> during suspend.
> 
> > This is because if there is no consumer-supplier or parent-children
> > relationship between these devices, the suspend routine of this sync
> > device will be executed directly without waiting for the suspend
> > routines of the async devices listed before it to complete. In other words,
> this sync device does not depend on the async devices listed before them.
> 
> Well, not quite, IIUC.
> 
> Say there are two devices, A and B, the former is "sync" and the latter is
> "async".  Say there is a dependency between them such that A must suspend
> before B, but it is not expressed directly (no parent-child relationship and no
> device link).
> 
> If A is before B in the list order and async suspend is not started upfront, the
> dependency is met.  If async suspend is started upfront, B may suspend
> before A.
> 

Here are two instances where I mistakenly wrote "suspend" instead of
"resume." Please allow me to correct them as follows:

Before correction:
----------------------------------------------------------------------
This is because if there is no consumer-supplier or parent-children
relationship between these devices, the *suspend* routine of this sync
device will be executed directly without waiting for the *suspend*
routines of the async devices listed before it to complete.

After correction:
----------------------------------------------------------------------
This is because if there is no consumer-supplier or parent-children
relationship between these devices, the *resume* routine of this sync
device will be executed directly without waiting for the *resume*
routines of the async devices listed before it to complete.

Here’s what I want to convey:

The commit 5af84b82701a "PM: Asynchronous suspend and resume of devices"
seems to be based on the assumption that, except for devices with
consumer-supplier or parent-children relationships, no sync device depends
on any async devices. This is because only devices with consumer-supplier
or parent-children relationships would wait for the async routines to
complete.

In the example you previously provided, if A (sync) must suspend before
B (async), it indicates that A depends on B (i.e., A needs B). Therefore,
during the resume phase, A should wait for B's resume routine to complete
before starting its resume routine. However, in commit 5af84b82701a, only
devices with consumer-supplier or parent-children relationships would wait
for the async routine to finish, while the other sync devices would not. This
makes me think that, in fact, the sync devices without clearly defined
dependencies do not actually need any async devices to be ready. Otherwise,
there would be a potential risk here.

> > Therefore, during suspend, there is no need to ensure those async
> > devices listed before the sync devices are awake before these sync
> > devices complete their suspend routines. In summary, as long as we
> > manage the dependencies between consumers/suppliers and
> > parents/children relationships properly, we should be able to start the async
> devices' suspend routines upfront during suspend.
> 
> Well, if the dependencies are all represented either through parent-child
> relationships or through device links, then yes, but are they?  On all systems
> currently in the field?
> 
> So if you want to move this forward, please talk to Todd Brandt (CCed) to
> arrange for testing your patch in the client power lab.  It must at least not
> break suspend on any machines in there.

Thank you for your suggestion. We've requested Todd's assistance with the
testing. The results show that after applying this patch, all the machines
in the lab encountered no issues. Could you please advise on the next steps?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ