[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGETcx_C_UcjjPOfUip=L28P3PWjMvmSc4nZJ5ML=tScUXfk2w@mail.gmail.com>
Date: Mon, 8 Sep 2025 19:00:04 -0700
From: Saravana Kannan <saravanak@...gle.com>
To: tuhaowen <tuhaowen@...ontech.com>
Cc: wusamuel@...gle.com, rafael@...nel.org, len.brown@...el.com,
pavel@...nel.org, linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org,
huangbibo@...ontech.com
Subject: Re: Sync timeout mechanisms - Request for coordination
Hi Haowen,
On Sun, Sep 7, 2025 at 7:47 PM tuhaowen <tuhaowen@...ontech.com> wrote:
>
> Hi Samuel and Saravana,
>
> I hope this email finds you well. I'm reaching out regarding the sync
> timeout mechanisms for suspend/hibernation that we've both been working on.
I hope you are well too and thanks for reaching out to us.
> Rafael from the upstream kernel has indicated that he would prefer us to
> coordinate our approaches rather than having two separate implementations.
> He mentioned your patch series "PM: Support aborting suspend during
> filesystem sync" and suggested we work together on a unified solution.
>
> I would like to discuss how we can merge our approaches. Below is a
> summary of my current implementation:
>
> **My approach - Time-based timeout mechanism:**
> - Introduces a configurable timeout for sync operations during both
> suspend and hibernation
> - Uses kthread with wait_for_completion_timeout() to implement timeout
> - Provides sysfs interface /sys/power/sleep_sync_timeout for runtime
> configuration
> - Default behavior unchanged (timeout disabled by default)
> - Addresses scenarios where sync is excessively slow without wakeup events
This doesn't really fix our issue where we want to abort suspend only
if we have to stay awake. If there's no need to abort the suspend (to
deal with some user/wake source request), then it's okay to take the
time to sync and then suspend. If you abort the suspend, it's just
going to get attempted again and in fact waste power.
I also don't understand how your patch fixes anything. If anything
this makes things worse because the user might have expected their
device to have hibernated only to come back hours later to see their
battery dead. And even if the user is actively monitoring it, you
still need to sync the file system fully before you eventually
suspend. So, this doesn't really save any time or the need to sync.
>
> You can see the detailed implementation and Rafael's feedback at:
> https://lore.kernel.org/linux-pm/CAJZ5v0jBRy=CvZiWQQaorvc-zT+kkaB6+S2TErGmkaPAGmHLOQ@mail.gmail.com/
>
> **Key differences I see between our approaches:**
> 1. Your solution focuses on aborting sync when wakeup events occur
> 2. My solution addresses cases where there are no wakeup events but sync
> is excessively slow (e.g., slow/faulty storage)
> 3. Your approach uses workqueue + completion, mine uses kthread + timeout
I don't think the workqueue vs kthread matters -- trivial
implementation detail. The important point is when we want to abort.
> 4. Both aim to prevent indefinite hangs but cover different scenarios
I don't see the point of your change though. Why abort a suspend if
there's no need to wake up? I think whatever use case issue you are
hitting, it's more of an issue of not grabbing a wake source when it's
needed.
Can you explain the actual real world issue you are trying to fix? If
it's the UI hanging and not responding to keypress/mouse move, then to
me it looks like those should be marked as wake sources.
>
> **Potential unified approach:**
> I believe both mechanisms could complement each other:
> - Event-driven abort (your approach) for responsive wakeup handling
> - Time-based timeout (my approach) for proactive protection against
> slow storage
> - Single, unified implementation in kernel/power/main.c
>
> Would you be interested in discussing how we can combine these approaches?
> I'm open to:
> 1. Merging the implementations into a single solution
> 2. Adopting your workqueue approach with added timeout functionality
> 3. Any other technical approach you think would work better
I'm not convinced adding a timeout is the right solution. It just adds
another point of failure and the need for a retry mechanism.
However, if you are really set on proving the need for a timeout and
implementing it, you can always add it as a separate patch after our
patch lands. You can set up a timer to call pm_system_wakeup(). Or
just grab a wake source after a time period.
In fact, thinking more about this, you are generally having a problem
with suspend taking too long to complete. Doesn't matter if it's due
to file system sync or anything else. In which case, you should just
use a timer to call pm_system_wakeup() and not fix just file system
sync (which happens to be the current long pole).
> Looking forward to your thoughts and collaboration.
Hope my response helps.
-Saravana
Powered by blists - more mailing lists