[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAG2KctpkwfO2sGg_mQLSUKrP5Zmd=sTMeSgT5K6ZFgZ-pzO4LA@mail.gmail.com>
Date: Wed, 19 Nov 2025 09:10:14 -0800
From: Samuel Wu <wusamuel@...gle.com>
To: "Rafael J. Wysocki" <rafael@...nel.org>
Cc: Len Brown <lenb@...nel.org>, Pavel Machek <pavel@...nel.org>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>, Danilo Krummrich <dakr@...nel.org>, tuhaowen@...ontech.com,
Saravana Kannan <saravanak@...gle.com>, kernel-team@...roid.com, linux-pm@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v6] PM: Support aborting sleep during filesystem sync
On Sat, Nov 15, 2025 at 9:35 AM Rafael J. Wysocki <rafael@...nel.org> wrote:
>
> On Wednesday, November 12, 2025 8:38:45 PM CET Samuel Wu wrote:
> > On Fri, Nov 7, 2025 at 1:15 PM Rafael J. Wysocki <rafael@...nel.org> wrote:
> > >
> > > On Fri, Nov 7, 2025 at 9:58 PM Rafael J. Wysocki <rafael@...nel.org> wrote:
> > > >
> > > > On Wed, Nov 5, 2025 at 2:20 AM Samuel Wu <wusamuel@...gle.com> wrote:
> > > > >
> > > > > On Tue, Nov 4, 2025 at 10:52 AM Rafael J. Wysocki <rafael@...nel.org> wrote:
> > > > > >
> > > > > > On Thu, Oct 30, 2025 at 10:01 PM Samuel Wu <wusamuel@...gle.com> wrote:
> > > > > > >
> > > > > > > At the start of suspend and hibernate, filesystems will sync to save the
> > > > > > > current state of the device. However, the long tail of the filesystem
> > > > > > > sync can take upwards of 25 seconds. If during this filesystem sync
> > > > > > > there is some wakeup or abort signal, it will not be processed until the
> > > > > > > sync is complete; from a user's perspective, this looks like the device
> > > > > > > is unresponsive to any form of input.
> > > > > > >
> > > > > > > This patch adds functionality to handle a sleep abort signal when in
> > > > > > > the filesystem sync phase of suspend or hibernate. This topic was first
> > > > > > > discussed by Saravana Kannan at LPC 2024 [1], where the general
> > > > > > > consensus was to allow filesystem sync on a parallel thread. In case of
> > > > > > > abort, the suspend process will stop waiting on an in-progress
> > > > > > > filesystem sync, and continue by aborting suspend before the filesystem
> > > > > > > sync is complete.
> > > > > > >
> > > > > > > Additionally, there is extra care needed to account for back-to-back
> > > > > > > sleeps while maintaining functionality to immediately abort during the
> > > > > > > filesystem sync stage. Furthermore, in the case of the back-to-back
> > > > > > > sleeps, a subsequent filesystem sync is needed to ensure the latest
> > > > > > > files are synced right before sleep. If necessary, a subsequent sleep's
> > > > > > > filesystem sync will be queued, and will only start when the previous
> > > > > > > sleep's filesystem sync has finished. While waiting for the previous
> > > > > > > sleep's filesystem sync to finish, the subsequent sleep will still abort
> > > > > > > early if a wakeup event is triggered, solving the original issue of
> > > > > > > filesystem sync blocking abort.
> > > > > > >
> > > > > > > [1]: https://lpc.events/event/18/contributions/1845/
> > > > > > >
> > > > > > > Suggested-by: Saravana Kannan <saravanak@...gle.com>
> > > > > > > Signed-off-by: Samuel Wu <wusamuel@...gle.com>
> > > > > > > ---
> > > > > > > Changes in v6:
> > > > > > > - Use spin_lock_irq() in thread context
> > > > > > > - Use dedicated ordered workqueue for sync work items
> > > > > > > - Use a counter instead of two bools for synchronization
> > > > > > > - Queue fs_sync if it's not already pending on workqueue
> > > > > > > - pm_wakeup_clear(0) is prequisite to this feature, so move it within function
> > > > > > > - Updated commit text for motive of back-to-back fs syncs
> > > > > > > - Tighter lock/unlock around setup, checks, and loop
> > > > > > > - Fix function definitions for CONFIG_PM_SLEEP=n
> > > > > > > - v5 link: https://lore.kernel.org/all/20251017233907.2305303-1-wusamuel@google.com/
> > > > > > >
> > > > > > > Changes in v5:
> > > > > > > - Update spin_lock() to spin_lock_irqsave() since abort can be in IRQ context
> > > > > > > - Updated changelog description to be more precise regarding continuing abort
> > > > > > > sleep before fs_sync() is complete
> > > > > > > - Rename abort_sleep_during_fs_sync() to pm_stop_waiting_for_fs_sync()
> > > > > > > - Simplify from a goto to do-while in pm_sleep_fs_sync()
> > > > > > > - v4 link: https://lore.kernel.org/all/20250911185314.2377124-1-wusamuel@google.com
> > > > > > >
> > > > > > > Changes in v4:
> > > > > > > - Removed patch 1/3 of v3 as it is already picked up on linux-pm
> > > > > > > - Squashed patches 2/3 and 3/3 from v3 into this single patch
> > > > > > > - Added abort during fs_sync functionality to hibernate in addition to suspend
> > > > > > > - Moved variables and functions for abort from power/suspend.c to power/main.c
> > > > > > > - Renamed suspend_fs_sync_with_abort() to pm_sleep_fs_sync()
> > > > > > > - Renamed suspend_abort_fs_sync() to abort_sleep_during_fs_sync()
> > > > > > > - v3 link: https://lore.kernel.org/all/20250821004237.2712312-1-wusamuel@google.com/
> > > > > > >
> > > > > > > Changes in v3:
> > > > > > > - Split v2 patch into 3 patches
> > > > > > > - Moved pm_wakeup_clear() outside of if(sync_on_suspend_enabled) condition
> > > > > > > - Updated documentation and comments within kernel/power/suspend.c
> > > > > > > - v2 link: https://lore.kernel.org/all/20250812232126.1814253-1-wusamuel@google.com/
> > > > > > >
> > > > > > > Changes in v2:
> > > > > > > - Added documentation for suspend_abort_fs_sync()
> > > > > > > - Made suspend_fs_sync_lock and suspend_fs_sync_complete declaration static
> > > > > > > - v1 link: https://lore.kernel.org/all/20250815004635.3684650-1-wusamuel@google.com
> > > > > > >
> > > > > > > drivers/base/power/wakeup.c | 8 ++++
> > > > > > > include/linux/suspend.h | 4 ++
> > > > > > > kernel/power/hibernate.c | 5 ++-
> > > > > > > kernel/power/main.c | 81 +++++++++++++++++++++++++++++++++++++
> > > > > > > kernel/power/suspend.c | 4 +-
> > > > > > > 5 files changed, 100 insertions(+), 2 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
> > > > > > > index d1283ff1080b..689c16b08b38 100644
> > > > > > > --- a/drivers/base/power/wakeup.c
> > > > > > > +++ b/drivers/base/power/wakeup.c
> > > > > > > @@ -570,6 +570,13 @@ static void wakeup_source_activate(struct wakeup_source *ws)
> > > > > > >
> > > > > > > /* Increment the counter of events in progress. */
> > > > > > > cec = atomic_inc_return(&combined_event_count);
> > > > > > > + /*
> > > > > > > + * wakeup_source_activate() aborts sleep only if events_check_enabled
> > > > > > > + * is set (see pm_wakeup_pending()). Similarly, abort sleep during
> > > > > > > + * fs_sync only if events_check_enabled is set.
> > > > > > > + */
> > > > > > > + if (events_check_enabled)
> > > > > > > + pm_stop_waiting_for_fs_sync();
> > > > > > >
> > > > > > > trace_wakeup_source_activate(ws->name, cec);
> > > > > > > }
> > > > > > > @@ -899,6 +906,7 @@ EXPORT_SYMBOL_GPL(pm_wakeup_pending);
> > > > > > > void pm_system_wakeup(void)
> > > > > > > {
> > > > > > > atomic_inc(&pm_abort_suspend);
> > > > > > > + pm_stop_waiting_for_fs_sync();
> > > > > > > s2idle_wake();
> > > > > > > }
> > > > > > > EXPORT_SYMBOL_GPL(pm_system_wakeup);
> > > > > > > diff --git a/include/linux/suspend.h b/include/linux/suspend.h
> > > > > > > index b02876f1ae38..4795f55f9cbe 100644
> > > > > > > --- a/include/linux/suspend.h
> > > > > > > +++ b/include/linux/suspend.h
> > > > > > > @@ -450,6 +450,8 @@ void restore_processor_state(void);
> > > > > > > extern int register_pm_notifier(struct notifier_block *nb);
> > > > > > > extern int unregister_pm_notifier(struct notifier_block *nb);
> > > > > > > extern void ksys_sync_helper(void);
> > > > > > > +extern void pm_stop_waiting_for_fs_sync(void);
> > > > > > > +extern int pm_sleep_fs_sync(void);
> > > > > > > extern void pm_report_hw_sleep_time(u64 t);
> > > > > > > extern void pm_report_max_hw_sleep(u64 t);
> > > > > > > void pm_restrict_gfp_mask(void);
> > > > > > > @@ -505,6 +507,8 @@ static inline void pm_restrict_gfp_mask(void) {}
> > > > > > > static inline void pm_restore_gfp_mask(void) {}
> > > > > > >
> > > > > > > static inline void ksys_sync_helper(void) {}
> > > > > > > +static inline void pm_stop_waiting_for_fs_sync(void) {}
> > > > > > > +static inline int pm_sleep_fs_sync(void) { return 0; }
> > > > > > >
> > > > > > > #define pm_notifier(fn, pri) do { (void)(fn); } while (0)
> > > > > > >
> > > > > > > diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
> > > > > > > index 53166ef86ba4..1874fde4b4f3 100644
> > > > > > > --- a/kernel/power/hibernate.c
> > > > > > > +++ b/kernel/power/hibernate.c
> > > > > > > @@ -820,7 +820,10 @@ int hibernate(void)
> > > > > > > if (error)
> > > > > > > goto Restore;
> > > > > > >
> > > > > > > - ksys_sync_helper();
> > > > > > > + error = pm_sleep_fs_sync();
> > > > > > > + if (error)
> > > > > > > + goto Restore;
> > > > > > > +
> > > > > > > if (filesystem_freeze_enabled)
> > > > > > > filesystems_freeze();
> > > > > > >
> > > > > > > diff --git a/kernel/power/main.c b/kernel/power/main.c
> > > > > > > index a6cbc3f4347a..23ca87a172a4 100644
> > > > > > > --- a/kernel/power/main.c
> > > > > > > +++ b/kernel/power/main.c
> > > > > > > @@ -582,6 +582,84 @@ bool pm_sleep_transition_in_progress(void)
> > > > > > > {
> > > > > > > return pm_suspend_in_progress() || hibernation_in_progress();
> > > > > > > }
> > > > > > > +
> > > > > > > +static int pm_sleep_fs_syncs_queued;
> > > > > > > +static DEFINE_SPINLOCK(pm_sleep_fs_sync_lock);
> > > > > > > +static DECLARE_COMPLETION(pm_sleep_fs_sync_complete);
> > > > > > > +static struct workqueue_struct *pm_fs_sync_wq;
> > > > > > > +
> > > > > > > +static int __init pm_start_fs_sync_workqueue(void)
> > > > > > > +{
> > > > > > > + pm_fs_sync_wq = alloc_ordered_workqueue("pm_fs_sync_wq", 0);
> > > > > > > +
> > > > > > > + return pm_fs_sync_wq ? 0 : -ENOMEM;
> > > > > > > +}
> > > > > > > +
> > > > > > > +/**
> > > > > > > + * pm_stop_waiting_for_fs_sync - Abort fs_sync to abort sleep early
> > > > > > > + *
> > > > > > > + * This function causes the suspend process to stop waiting on an in-progress
> > > > > > > + * filesystem sync, such that the suspend process can be aborted before the
> > > > > > > + * filesystem sync is complete.
> > > > > > > + */
> > > > > > > +void pm_stop_waiting_for_fs_sync(void)
> > > > > > > +{
> > > > > > > + unsigned long flags;
> > > > > > > +
> > > > > > > + spin_lock_irqsave(&pm_sleep_fs_sync_lock, flags);
> > > > > > > + complete(&pm_sleep_fs_sync_complete);
> > > > > > > + spin_unlock_irqrestore(&pm_sleep_fs_sync_lock, flags);
> > > > > > > +}
> > > > > >
> > > > > > Apart from the kernel test robot reports,
> > > > >
> > > > > Of course, I'll fix this in v7.
> > > > >
> > > > > > pm_stop_waiting_for_fs_sync() has become slightly too heavy for
> > > > > > calling it from wakeup_source_activate().
> > > > >
> > > > > Trying to understand- are you saying spin_lock_irqsave() makes
> > > > > pm_stop_waiting_for_fs_sync() too slow?
> > > >
> > > > Spin lock and the completion handling.
> > > >
> > > > This function has been designed to be as lightweight as reasonably
> > > > possible and the $subject patch is adding a branch and a global
> > > > spinlock locking to it.
> > > >
> > > > > > Waking up the suspend process from there should be sufficient. The
> > > > > > completion is not necessary for that in principle.
> > > > >
> > > > > Can you elaborate more on what "there" means and why completion isn't
> > > > > necessary? From what I can see, the only way to abort the suspend
> > > > > _early_ is with the completion.
> > > >
> > > > Well, there are wait queues.
> > > >
> > > > In the first place though, do you really need to stop the suspend
> > > > process immediately after a wakeup event?
> >
> > Yes, we would like to stop suspend as soon as a wakeup event occurs. A
> > delay in processing the wakeup event manifests as an unresponsive
> > device; for example, it is a poor user experience when pressing the
> > lock button lags on the order of seconds.
> >
> > > > This generally does not happen and wakeup sources are designed with
> > > > the assumption that it need not happen: The suspend process will check
> > > > if there is a pending wakeup at some places and wakeup sources just
> > > > need to update the counters.
> >
> > Agree, generally this isn't a concern but even 1% of cases means over
> > a million devices are affected. Coming back to the original premise,
> > the checks at discrete points are not sufficient since they don't
> > handle the cases when fs_sync takes 20+ seconds.
>
> Well, I'm not talking about this.
>
> Appended is my version of the change in question (compiled, but not tested),
> which admittedly should be split into two patches.
>
> Now, tell my why exactly the value of 5 for PM_FS_SYNC_WAKEUP_RESOLUTION is
> inadequate because that's effectively what you're saying here.
>
> > > > Quite frankly, I don't see why the filesystem sync period needs to be
> > > > special in that respect. And if it need not be special, nothing needs
> > > > to be added to wakeup_source_activate().
> >
> > fs_sync is special because it's been empirically identified as a phase
> > of suspend which causes suspend to be more than 100x slower.
> >
> > > In case it is unclear where I'm going with this, the suspend process
> > > can be made use wait_event_timeout() with a timeout of, say, 20 ms to
> > > wait for pm_sleep_fs_syncs_queued to drop down to 0 in a loop and
> > > check pm_wakeup_pending() in every iteration.
> >
> > Understood, this sounds similar to the polling idea in that of
> > try_to_freeze_tasks(). However, there have been efforts just to save
> > even <10ms in the suspend timeline, so the event driven approach would
> > be more performant than polling; also especially since we've been
> > refining the event driven approach in this patch for the past 6
> > versions.
>
> Which doesn't mean that it is the best approach.
>
> The event driver approach adds overhead elsewhere because wakeup_source_activate()
> is mostly used outside system suspend/resume paths and in some latency-sensitive
> places.
>
> > Appreciate the feedback and discussion, please let me know what you
> > think. Thanks!
>
> So unless I'm convinced that the patch below is insufficient, I'll be reluctant
> to apply anything more complicated.
>
> ---
> kernel/power/hibernate.c | 6 +++
> kernel/power/main.c | 75 +++++++++++++++++++++++++++++++++++++++++++----
> kernel/power/power.h | 1
> kernel/power/suspend.c | 6 +++
> kernel/power/user.c | 4 +-
> 5 files changed, 83 insertions(+), 9 deletions(-)
>
> --- a/kernel/power/hibernate.c
> +++ b/kernel/power/hibernate.c
> @@ -820,7 +820,10 @@ int hibernate(void)
> if (error)
> goto Restore;
>
> - ksys_sync_helper();
> + error = pm_sleep_fs_sync();
> + if (error)
> + goto Notify;
> +
> if (filesystem_freeze_enabled)
> filesystems_freeze();
>
> @@ -892,6 +895,7 @@ int hibernate(void)
> freezer_test_done = false;
> Exit:
> filesystems_thaw();
> + Notify:
> pm_notifier_call_chain(PM_POST_HIBERNATION);
> Restore:
> pm_restore_console();
> --- a/kernel/power/main.c
> +++ b/kernel/power/main.c
> @@ -92,6 +92,61 @@ void ksys_sync_helper(void)
> }
> EXPORT_SYMBOL_GPL(ksys_sync_helper);
>
> +#if defined(CONFIG_SUSPEND) || defined(CONFIG_HIBERNATION)
> +/* Wakeup events handling resolution while syncing file systems in jiffies */
> +#define PM_FS_SYNC_WAKEUP_RESOLUTION 5
> +
> +static atomic_t pm_fs_sync_count = ATOMIC_INIT(0);
> +static struct workqueue_struct *pm_fs_sync_wq;
> +static DECLARE_WAIT_QUEUE_HEAD(pm_fs_sync_wait);
> +
> +static bool pm_fs_sync_completed(void)
> +{
> + return atomic_read(&pm_fs_sync_count) == 0;
> +}
> +
> +static void pm_fs_sync_work_fn(struct work_struct *work)
> +{
> + ksys_sync_helper();
> +
> + if (atomic_add_return(-1, &pm_fs_sync_count) == 0)
> + wake_up(&pm_fs_sync_wait);
> +}
> +static DECLARE_WORK(pm_fs_sync_work, pm_fs_sync_work_fn);
> +
> +/**
> + * pm_sleep_fs_sync() - Sync file systems in an interruptible way
> + *
> + * Return 0 on successful file system sync, or return -EBUSY if the file
> + * system sync was aborted.
> + */
> +int pm_sleep_fs_sync(void)
> +{
> + pm_wakeup_clear(0);
> +
> + /*
> + * Take back-to-back sleeps into account by queuing a subsequent fs sync
> + * only if the previous fs sync is running or is not queued. Multiple fs
> + * syncs increase the likelihood of saving the latest files immediately
> + * before sleep.
> + */
> + if (!work_pending(&pm_fs_sync_work)) {
> + atomic_inc(&pm_fs_sync_count);
> + queue_work(pm_fs_sync_wq, &pm_fs_sync_work);
> + }
> +
> + while (!pm_fs_sync_completed()) {
> + if (pm_wakeup_pending())
> + return -EBUSY;
> +
> + wait_event_timeout(pm_fs_sync_wait, pm_fs_sync_completed(),
> + PM_FS_SYNC_WAKEUP_RESOLUTION);
> + }
> +
> + return 0;
> +}
> +#endif /* CONFIG_SUSPEND || CONFIG_HIBERNATION */
> +
> /* Routines for PM-transition notifications */
>
> static BLOCKING_NOTIFIER_HEAD(pm_chain_head);
> @@ -231,10 +286,10 @@ static ssize_t mem_sleep_store(struct ko
> power_attr(mem_sleep);
>
> /*
> - * sync_on_suspend: invoke ksys_sync_helper() before suspend.
> + * sync_on_suspend: Sync file systems before suspend.
> *
> - * show() returns whether ksys_sync_helper() is invoked before suspend.
> - * store() accepts 0 or 1. 0 disables ksys_sync_helper() and 1 enables it.
> + * show() returns whether file systems sync before suspend is enabled.
> + * store() accepts 0 or 1. 0 disables file systems sync and 1 enables it.
> */
> bool sync_on_suspend_enabled = !IS_ENABLED(CONFIG_SUSPEND_SKIP_SYNC);
>
> @@ -1066,16 +1121,24 @@ static const struct attribute_group *att
> struct workqueue_struct *pm_wq;
> EXPORT_SYMBOL_GPL(pm_wq);
>
> -static int __init pm_start_workqueue(void)
> +static int __init pm_start_workqueues(void)
> {
> pm_wq = alloc_workqueue("pm", WQ_FREEZABLE | WQ_UNBOUND, 0);
> + if (!pm_wq)
> + return -ENOMEM;
> +
> +#if defined(CONFIG_SUSPEND) || defined(CONFIG_HIBERNATION)
> + pm_fs_sync_wq = alloc_ordered_workqueue("pm_fs_sync", 0);
> + if (!pm_fs_sync_wq)
> + return -ENOMEM;
> +#endif
>
> - return pm_wq ? 0 : -ENOMEM;
> + return 0;
> }
>
> static int __init pm_init(void)
> {
> - int error = pm_start_workqueue();
> + int error = pm_start_workqueues();
> if (error)
> return error;
> hibernate_image_size_init();
> --- a/kernel/power/power.h
> +++ b/kernel/power/power.h
> @@ -19,6 +19,7 @@ struct swsusp_info {
> } __aligned(PAGE_SIZE);
>
> #if defined(CONFIG_SUSPEND) || defined(CONFIG_HIBERNATION)
> +extern int pm_sleep_fs_sync(void);
> extern bool filesystem_freeze_enabled;
> #endif
>
> --- a/kernel/power/suspend.c
> +++ b/kernel/power/suspend.c
> @@ -590,7 +590,11 @@ static int enter_state(suspend_state_t s
>
> if (sync_on_suspend_enabled) {
> trace_suspend_resume(TPS("sync_filesystems"), 0, true);
> - ksys_sync_helper();
> +
> + error = pm_sleep_fs_sync();
> + if (error)
> + goto Unlock;
> +
> trace_suspend_resume(TPS("sync_filesystems"), 0, false);
> }
>
> --- a/kernel/power/user.c
> +++ b/kernel/power/user.c
> @@ -278,7 +278,9 @@ static long snapshot_ioctl(struct file *
> if (data->frozen)
> break;
>
> - ksys_sync_helper();
> + error = pm_sleep_fs_sync();
> + if (error)
> + break;
>
> error = freeze_processes();
> if (error)
>
>
Appreciate the patch and feedback Rafael. I've tested both suspend and
hibernate callflows with this approach and it seems to work fine.
I will send out a v7 based on this patch, and with a few minor edits I
thought was appropriate. Thanks!
-- Sam
Powered by blists - more mailing lists