[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20251020021228.2336745-1-tuhaowen@uniontech.com>
Date: Mon, 20 Oct 2025 10:12:28 +0800
From: tuhaowen <tuhaowen@...ontech.com>
To: rafael@...nel.org
Cc: dakr@...nel.org,
gregkh@...uxfoundation.org,
kernel-team@...roid.com,
lenb@...nel.org,
linux-kernel@...r.kernel.org,
linux-pm@...r.kernel.org,
pavel@...nel.org,
saravanak@...gle.com,
wusamuel@...gle.com
Subject: Re: [PATCH v4] PM: Support aborting sleep during filesystem sync
Hi Rafael,
Thank you for your attention to this matter. I'd like to clarify the
difference between our approach and the Google team's solution, as they
address fundamentally different use cases and environments.
On Mon, Oct 13, 2025 at 8:02 PM Rafael J. Wysocki <rafael@...nel.org> wrote:
> > No, it's different. We don't mind a long filesystem sync if we don't
> > have a need to abort a suspend. If it takes 25 seconds to sync the
> > filesystem but there's no need to abort it, that's totally fine. So,
> > this patch is just about allowing abort to happen without waiting for
> > file system sync to finish.
Saravana is correct that our problems are different. Let me explain the
key distinction:
**Google's approach (Mobile/Android focus)**:
- Problem: Unnecessary wake-ups during sync operations waste battery
- Solution: Abort sync only when wakeup events occur (user interaction)
- Philosophy: Wait indefinitely for sync if no user action required
- Use case: Mobile devices where users expect to press power button to wake
**Our approach (Desktop/PC focus)**:
- Problem: Indefinite sync hangs leave users with unresponsive black screen
- Solution: Proactive timeout to prevent system appearing frozen
- Philosophy: Provide user feedback and system recovery within reasonable time
- Use case: Desktop/laptop where users expect immediate system response
> > The other patch's requirement is to always abort if suspend takes 25
> > seconds (or whatever the timeout is). IIRC, in his case, it's because
> > of a bad disk or say a USB disk getting unplugged. I'm not convinced a
> > suspend timeout is the right thing to do, but I'm not going to nack
> > it. But to implement his requirement, he can put a patch on top of
> > ours where he sets a timer and then aborts suspends if it fires.
The key difference is **when** we need to abort:
- Google: Abort when user wants to wake up (reactive)
- UnionTech: Abort when sync becomes pathologically slow (proactive)
For desktop users, a 25-second black screen with no feedback creates the
impression of a system freeze, especially when caused by removed USB
devices or failing storage. Users cannot distinguish between "system is
syncing" and "system has crashed" without feedback.
**Question about integration**:
Since both approaches serve legitimate but different needs, could we
implement a unified solution that supports both mechanisms? For example:
1. **Combined approach**: Implement both wakeup-based abort (Google's patch)
and timeout-based abort (our patch) in the same framework
2. **Configuration via sysfs**: Add a node to control the behavior:
- `/sys/power/sync_abort_mode`:
- "wakeup-only": Use Google's approach (abort only on wakeup events)
- "timeout": Use our approach (abort on timeout)
- "both": Use both mechanisms (abort on either condition)
3. **Default behavior**: Could default to "wakeup-only" for mobile/embedded
systems and "timeout" for desktop systems, or let distributions choose
This would allow different systems to choose appropriate behavior based
on their needs.
Would this unified approach be acceptable? We're happy to work on
implementation details with the Google team to ensure both use cases
are properly addressed.
**Additional concern about integration timing**:
I noticed that Samuel Wu previously mentioned in his response to you
(Sep 30, 2025) that our approaches could be "decoupled" and that I could
"build changes on top of theirs." However, after reviewing their v4 patch
implementation, I'm concerned that if their approach lands first, it may
make our timeout-based solution significantly more difficult to integrate.
Their current implementation:
- Uses workqueue + completion for sync operations
- Introduces pm_sleep_fs_sync() as the main interface
- Adds complex state management for back-to-back sleep attempts
This architecture makes it challenging to integrate our timeout approach
and add mode switching functionality. If their patch lands first, adding
our timeout mechanism would require:
- Modifying their workqueue-based sync mechanism to support timeout
- Adding logic to coordinate between workqueue completion and timeout
- Implementing mode switching between wakeup-abort and timeout-abort
- Ensuring proper interaction between the two abort mechanisms
The main challenge is: how do we add timeout functionality to their
workqueue + completion design? And how do we implement clean switching
between "abort on wakeup events" mode versus "abort on timeout" mode?
Their current design focuses solely on wakeup-based abort, so retrofitting
timeout support and mode selection would require significant changes to
their implementation.
Would it be possible to consider both approaches simultaneously to ensure
a clean integration path? This might result in a better unified solution
than trying to retrofit timeout functionality into their workqueue-based
implementation.
Best regards,
Haowen Tu
Powered by blists - more mailing lists