linux-kernel - Re: [RFC PATCH] PM: Optionally block user fork during freeze to improve performance

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ee1de994-e59f-4c6c-96f3-66056b002889@kylinos.cn>
Date: Mon, 16 Jun 2025 11:46:34 +0800
From: Zihuan Zhang <zhangzihuan@...inos.cn>
To: Michal Hocko <mhocko@...e.com>
Cc: David Hildenbrand <david@...hat.com>,
 Peter Zijlstra <peterz@...radead.org>, rafael@...nel.org,
 len.brown@...el.com, pavel@...nel.org, kees@...nel.org, mingo@...hat.com,
 juri.lelli@...hat.com, vincent.guittot@...aro.org, dietmar.eggemann@....com,
 rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
 vschneid@...hat.com, akpm@...ux-foundation.org, lorenzo.stoakes@...cle.com,
 Liam.Howlett@...cle.com, vbabka@...e.cz, rppt@...nel.org, surenb@...gle.com,
 linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [RFC PATCH] PM: Optionally block user fork during freeze to
 improve performance

Hi  Michal,

Thanks for the question.

在 2025/6/13 15:05, Michal Hocko 写道:
> On Fri 13-06-25 10:37:42, Zihuan Zhang wrote:
>> Hi David,
>> Thanks for your advice!
>>
>> 在 2025/6/10 18:50, David Hildenbrand 写道:
>>> 　　　 　 　 　　 　 　 　 　 　 　 　 　 　 　　
>>> Can't this problem be mitigated by simply not scheduling the new fork'ed
>>> process while the system is frozen?
>>>
>>> Or what exact scenario are you worried about?
>> Let me revisit the core issue for clarity. Under normal conditions, most
>> processes in the system are in a sleep state, and only a few are runnable.
>> So even with thousands of processes, the freezer generally works reliably
>> and completes within a reasonable time
> How do you define reasonable time?
>

To clarify: freezing a process typically takes only a few dozen 
microseconds. In our tests, the freezer includes a usleep_range() delay 
between retries, which is about 1ms in the first round and doubles in 
subsequent rounds. Despite this delay, we observed that around 10% of 
the processes were not frozen during the first pass and had to be retried.

This suggests that even with a reasonably sufficient delay, some newly 
forked processes do not get frozen in time during the first iteration, 
simply due to timing. The freeze latency itself remains small, but not 
all processes are caught on the first try.
>> However, in our fork-based test scenario, we observed repeated freeze
>> retries.
> Does this represent any real life scenario that happens on your system?
> In other words how often do you miss your "reasonable time" treshold
> while running a regular workload. Does the freezer ever fail?
>
> [...]
In our test scenario, although new processes can indeed be created 
during the usleep_range() intervals between freeze iterations, it’s 
actually difficult to make the freezer fail outright. This is because 
user processes are forcibly frozen: when they return to user space and 
check for pending signals, they enter try_to_freeze() and transition 
into the refrigerator.

However, since the scheduler is fair by design, it gives both newly 
forked tasks and yet-to-be-frozen tasks a chance to run. This 
competition for CPU time can slightly delay the overall freeze process. 
While this typically doesn’t lead to failure, it does cause more retries 
than necessary, especially under CPU pressure.

Given that freezing is a clearly defined and semantically critical state 
transition, we believe it makes sense to prioritize the execution of 
tasks that are pending freezing over newly forked ones—particularly in 
resource-constrained environments
>> You’re right — blocking fork() is quite intrusive, so it’s worth exploring
>> alternatives. We’ll try implementing your idea of preventing the newly
>> forked process from being scheduled while the system is freezing, rather
>> than failing the fork() call outright.
> Just curious, are you interested in global freezer only or is the cgroup
> freezer involved as well?
>
At this stage, our focus is mainly on the global freezer during system 
suspend and hibernate (S3/S4). However, the patch itself is based on the 
generic freezing() and freeze_task() logic, so it should also work with 
the cgroup freezer as well.