lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250807121418.139765-1-zhangzihuan@kylinos.cn>
Date: Thu,  7 Aug 2025 20:14:09 +0800
From: Zihuan Zhang <zhangzihuan@...inos.cn>
To: "Rafael J . Wysocki" <rafael@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Oleg Nesterov <oleg@...hat.com>,
	David Hildenbrand <david@...hat.com>,
	Michal Hocko <mhocko@...e.com>,
	Jonathan Corbet <corbet@....net>
Cc: Ingo Molnar <mingo@...hat.com>,
	Juri Lelli <juri.lelli@...hat.com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ben Segall <bsegall@...gle.com>,
	Mel Gorman <mgorman@...e.de>,
	Valentin Schneider <vschneid@...hat.com>,
	len brown <len.brown@...el.com>,
	pavel machek <pavel@...nel.org>,
	Kees Cook <kees@...nel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
	"Liam R . Howlett" <Liam.Howlett@...cle.com>,
	Vlastimil Babka <vbabka@...e.cz>,
	Mike Rapoport <rppt@...nel.org>,
	Suren Baghdasaryan <surenb@...gle.com>,
	Catalin Marinas <catalin.marinas@....com>,
	Nico Pache <npache@...hat.com>,
	xu xin <xu.xin16@....com.cn>,
	wangfushuai <wangfushuai@...du.com>,
	Andrii Nakryiko <andrii@...nel.org>,
	Christian Brauner <brauner@...nel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Jeff Layton <jlayton@...nel.org>,
	Al Viro <viro@...iv.linux.org.uk>,
	Adrian Ratiu <adrian.ratiu@...labora.com>,
	linux-pm@...r.kernel.org,
	linux-mm@...ck.org,
	linux-fsdevel@...r.kernel.org,
	linux-doc@...r.kernel.org,
	linux-kernel@...r.kernel.org,
	Zihuan Zhang <zhangzihuan@...inos.cn>
Subject: [RFC PATCH v1 0/9] freezer: Introduce freeze priority model to address process dependency issues

The Linux task freezer was designed in a much earlier era, when userspace was relatively simple and flat.
Over the years, as modern desktop and mobile systems have become increasingly complex—with intricate IPC,
asynchronous I/O, and deep event loops—the original freezer model has shown its age.

## Background

Currently, the freezer traverses the task list linearly and attempts to freeze all tasks equally.
It sends a signal and waits for `freezing()` to become true. While this model works well in many cases, it has several inherent limitations:

- Signal-based logic cannot freeze uninterruptible (D-state) tasks
- Dependencies between processes can cause freeze retries 
- Retry-based recovery introduces unpredictable suspend latency

## Real-world problem illustration

Consider the following scenario during suspend:

Freeze Window Begins

    [process A] - epoll_wait()
        │
        ▼
    [process B] - event source (already frozen)

→ A enters D-state because of waiting for B
→ Cannot respond to freezing signal
→ Freezer retries in a loop
→ Suspend latency spikes

In such cases, we observed that a normal 1–2ms freezer cycle could balloon to **tens of milliseconds**. 
Worse, the kernel has no insight into the root cause and simply retries blindly.

## Proposed solution: Freeze priority model

To address this, we propose a **layered freeze model** based on per-task freeze priorities.

### Design

We introduce 4 levels of freeze priority:


| Priority | Level             | Description                       |
|----------|-------------------|-----------------------------------|
| 0        | HIGH              | D-state TASKs                     |
| 1        | NORMAL            | regular  use space TASKS          |
| 2        | LOW               | not yet used                      |
| 4        | NEVER_FREEZE      | zombie TASKs , PF_SUSPNED_TASK    |


The kernel will freeze processes **in priority order**, ensuring that higher-priority tasks are frozen first.
This avoids dependency inversion scenarios and provides a deterministic path forward for tricky cases.
By freezing control or event-source threads first, we prevent dependent tasks from entering D-state prematurely — effectively avoiding dependency inversion.

Although introducing more fine-grained freeze_priority levels improves extensibility and allows better modeling of task dependencies, 
it may also introduce additional overhead during task traversal, potentially affecting freezer performance.

In our test environment, increasing the maximum freeze retries to 16 only added ~4ms of overhead to the total suspend latency,
suggesting the added robustness comes at a relatively low cost. However, for latency-critical systems, this trade-off should be carefully evaluated.

## Benefits

- Solves D-state process freeze stalls caused by premature freezing of dependencies
- Enables more robust and reliable suspend/resume on complex userspace systems
- Introduces extensibility: tasks can be categorized by role, urgency, or dependency
- Reduces race conditions by introducing deterministic freezing order

## Previous Discussion
Link: https://lore.kernel.org/all/20250606062502.19607-1-zhangzihuan@kylinos.cn/
Link: https://lore.kernel.org/all/1ca889fd-6ead-4d4f-a3c7-361ea05bb659@kylinos.cn/

## Future directions

This framework opens up several promising areas for further development:

1. Adaptive behavior based on runtime statistics or retry feedback
The freezer adapts dynamically during suspend/hibernate based on the number of retries and which tasks failed to freeze. 
Tasks that failed in previous rounds will be assigned a higher freeze priority, improving convergence speed and reducing unnecessary retries.

2. cgroup-aware hierarchical freezing for containerized systems
The design supports cgroup-aware task traversal and freezing. 
This ensures compatibility with containerized environments, allowing for better control and visibility when freezing processes in different cgroups.

3. Unified freezing of userspace processes and kernel threads
Based on extensive testing, we found that freezing userspace tasks and kernel threads together works reliably in practice. 
Separating them does not resolve dependency issues between user and kernel context. Moreover, most kernel threads are marked as non-freezable,
so including them in the same freeze pass does not impact correctness and simplifies the logic.

Although the current implementation is relatively simple, it already helps alleviate some suspend failures caused by tasks stuck in D state.
In our testing, we observed that certain D-state tasks are triggered by filesystem sync operations during the freezing phase.
At this stage, we don't yet have a comprehensive solution for that class of problems.
This patchset represents a testable version of our design. We plan to further investigate and address such filesystem-related D-state issues in future revisions.

Patch summary:
 - Patch 1-3: Core infrastructure: field, API, layered freeze logic
 - Patch 4-7: Default priorities and dynamic adjustments
 - Patch 8: Statistics: freeze pass retry count
 - Patch 9: Procfs interface for userspace access

Zihuan Zhang (9):
  freezer: Introduce freeze_priority field in task_struct
  freezer: Introduce API to set per-task freeze priority
  freezer: Add per-priority layered freeze logic
  freezer: Set default freeze priority for userspace tasks
  freezer: set default freeze priority for PF_SUSPEND_TASK processes
  freezer: Set default freeze priority for zombie tasks
  freezer: raise freeze priority of tasks failed to freeze last time
  freezer: Add retry count statistics for freeze pass iterations
  proc: Add /proc/<pid>/freeze_priority interface

 Documentation/filesystems/proc.rst | 14 ++++++-
 fs/proc/base.c                     | 64 ++++++++++++++++++++++++++++++
 include/linux/freezer.h            | 20 ++++++++++
 include/linux/sched.h              |  3 ++
 kernel/fork.c                      |  1 +
 kernel/power/process.c             | 23 ++++++++++-
 kernel/sched/core.c                |  2 +
 7 files changed, 124 insertions(+), 3 deletions(-)

-- 
2.25.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ