linux-kernel - Re: [RFC PATCH] PM: Optionally block user fork during freeze to improve performance

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b161b92e-c290-4a85-adc9-fbc325568e7d@kylinos.cn>
Date: Mon, 9 Jun 2025 11:46:19 +0800
From: zhangzihuan <zhangzihuan@...inos.cn>
To: Mateusz Guzik <mjguzik@...il.com>
Cc: David Hildenbrand <david@...hat.com>, rafael@...nel.org,
 len.brown@...el.com, pavel@...nel.org, kees@...nel.org, mingo@...hat.com,
 peterz@...radead.org, juri.lelli@...hat.com, vincent.guittot@...aro.org,
 dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
 mgorman@...e.de, vschneid@...hat.com, akpm@...ux-foundation.org,
 lorenzo.stoakes@...cle.com, Liam.Howlett@...cle.com, vbabka@...e.cz,
 rppt@...nel.org, surenb@...gle.com, mhocko@...e.com,
 linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [RFC PATCH] PM: Optionally block user fork during freeze to
 improve performance

Hi Mateusz,

Thanks again for your detailed input.

You’re absolutely right that try_to_freeze_tasks() holds the 
tasklist_lock during the main freeze loop, which temporarily blocks 
kernel_clone() at that stage. However, based on our observations and 
logs, the problem arises just after this loop, in the short window 
before the system enters suspend (e.g., around the brief usleep() 
period), when the lock is released and fork is once again possible.

To illustrate this, I’d like to share some dmesg logs gathered during a 
series of S3 suspend attempts. In most suspend cycles, we intentionally 
run a user-space process that forks rapidly during suspend, and we 
observe multiple retries during the “freezing user space processes” 
phase. Below are selected entries

在 2025/6/8 23:50, Mateusz Guzik 写道:
> On Sun, Jun 08, 2025 at 03:22:20PM +0800, zhangzihuan wrote:
>> One alternative could be to block in kernel_clone() until freezing ends,
>> instead of returning an error. That way, fork() would not fail, just
>> potentially block briefly (similar to memory pressure or cgroup limits). Do
>> you think that's more acceptable?
> So I had a look at the freezing loop and it operates with
> tasklist_lock held, meaning it already stalls clone().
>
> try_to_freeze_tasks() in kernel/power/process.c contains:
>
> 	todo = 0;
> 	read_lock(&tasklist_lock);
> 	for_each_process_thread(g, p) {
> 		if (p == current || !freeze_task(p))
> 			continue;
>
> 		todo++;
> 	}
> 	read_unlock(&tasklist_lock);
>
> I don't get where the assumption that fork itself is a factor is coming
> from.
>
> Looking at freezing itself it seems to me perf trouble starts with tons
> of processes existing to begin with in arbitrary states (not with racing
> against fork), requring a retry with preceeded by a sleep:
>
> 	/*
> 	 * We need to retry, but first give the freezing tasks some
> 	 * time to enter the refrigerator.  Start with an initial
> 	 * 1 ms sleep followed by exponential backoff until 8 ms.
> 	 */
> 	usleep_range(sleep_usecs / 2, sleep_usecs);
> 	if (sleep_usecs < 8 * USEC_PER_MSEC)
> 		sleep_usecs *= 2;
>
> For a race against fork to have any effect, the new thread has to be
> linked in to the global list -- otherwise the todo var wont get bumped.
>
> But then if it gets added in a state which is freezable, the racing fork
> did not cause any trouble.
>
> If it gets added in a state which is *NOT* freezable by the current
> code, maybe it should be patched to be freezable.
>
> All in all I'm not confident any of this warrants any work -- do you
> have a setup where the above causes a real problem?

Here is the log:

dmesg | grep -E 'elap|Files|retry'

[ 2556.566183] Filesystems sync: 0.012 seconds
[ 2556.570653] Freeing user space processes todo:1181 retry:0
[ 2556.572719] Freeing user space processes todo:0 retry:1
[ 2556.572730] Freezing user space processes completed (elapsed 0.006 
seconds)
[ 2556.573243] Freeing remaining freezable tasks todo:13 retry:0
[ 2556.574326] Freeing remaining freezable tasks todo:0 retry:1
[ 2556.574333] Freezing remaining freezable tasks completed (elapsed 
0.001 seconds)
[ 2560.647576] Filesystems sync: 0.018 seconds
[ 2560.656691] Freeing user space processes todo:2656 retry:0
[ 2560.661194] Freeing user space processes todo:327 retry:1
[ 2560.664130] Freeing user space processes todo:0 retry:2
[ 2560.664139] Freezing user space processes completed (elapsed 0.016 
seconds)
[ 2560.665475] Freeing remaining freezable tasks todo:13 retry:0
[ 2560.667159] Freeing remaining freezable tasks todo:0 retry:1
[ 2560.667170] Freezing remaining freezable tasks completed (elapsed 
0.003 seconds)
[ 2564.746592] Filesystems sync: 0.013 seconds
[ 2564.761025] Freeing user space processes todo:4192 retry:0
[ 2564.768048] Freeing user space processes todo:252 retry:1
[ 2564.773774] Freeing user space processes todo:0 retry:2
[ 2564.773801] Freezing user space processes completed (elapsed 0.026 
seconds)
[ 2564.776704] Freeing remaining freezable tasks todo:13 retry:0
[ 2564.781867] Freeing remaining freezable tasks todo:0 retry:1
[ 2564.781887] Freezing remaining freezable tasks completed (elapsed 
0.008 seconds)
[ 2568.872805] Filesystems sync: 0.010 seconds
[ 2568.893397] Freeing user space processes todo:5897 retry:0
[ 2568.903089] Freeing user space processes todo:0 retry:1
[ 2568.903102] Freezing user space processes completed (elapsed 0.030 
seconds)
[ 2568.907681] Freeing remaining freezable tasks todo:13 retry:0
[ 2568.914721] Freeing remaining freezable tasks todo:0 retry:1
[ 2568.914743] Freezing remaining freezable tasks completed (elapsed 
0.011 seconds)
[ 2573.019240] Filesystems sync: 0.018 seconds
[ 2573.044573] Freeing user space processes todo:7536 retry:0
[ 2573.056378] Freeing user space processes todo:261 retry:1
[ 2573.062016] Freeing user space processes todo:0 retry:2
[ 2573.062024] Freezing user space processes completed (elapsed 0.042 
seconds)
[ 2573.067114] Freeing remaining freezable tasks todo:13 retry:0
[ 2573.072597] Freeing remaining freezable tasks todo:0 retry:1
[ 2573.072604] Freezing remaining freezable tasks completed (elapsed 
0.010 seconds)
[ 2577.176003] Filesystems sync: 0.013 seconds
[ 2577.210773] Freeing user space processes todo:9042 retry:0
[ 2577.226116] Freeing user space processes todo:637 retry:1
[ 2577.233723] Freeing user space processes todo:0 retry:2
[ 2577.233733] Freezing user space processes completed (elapsed 0.057 
seconds)
[ 2577.240897] Freeing remaining freezable tasks todo:13 retry:0
[ 2577.250898] Freeing remaining freezable tasks todo:0 retry:1
[ 2577.250928] Freezing remaining freezable tasks completed (elapsed 
0.017 seconds)
[ 2581.358613] Filesystems sync: 0.014 seconds
[ 2581.397288] Freeing user space processes todo:10397 retry:0
[ 2581.415191] Freeing user space processes todo:107 retry:1
[ 2581.423085] Freeing user space processes todo:0 retry:2
[ 2581.423094] Freezing user space processes completed (elapsed 0.064 
seconds)
[ 2581.431079] Freeing remaining freezable tasks todo:13 retry:0
[ 2581.441576] Freeing remaining freezable tasks todo:0 retry:1
[ 2581.441596] Freezing remaining freezable tasks completed (elapsed 
0.018 seconds)
[ 2585.572128] Filesystems sync: 0.016 seconds
[ 2585.617543] Freeing user space processes todo:12330 retry:0
[ 2585.638997] Freeing user space processes todo:1227 retry:1
[ 2585.648592] Freeing user space processes todo:0 retry:2
[ 2585.648602] Freezing user space processes completed (elapsed 0.076 
seconds)
[ 2585.658063] Freeing remaining freezable tasks todo:13 retry:0
[ 2585.670385] Freeing remaining freezable tasks todo:0 retry:1
[ 2585.670405] Freezing remaining freezable tasks completed (elapsed 
0.021 seconds)
[ 2589.810371] Filesystems sync: 0.014 seconds
[ 2589.865483] Freeing user space processes todo:14036 retry:0
[ 2589.893513] Freeing user space processes todo:1288 retry:1
[ 2589.904032] Freeing user space processes todo:0 retry:2
[ 2589.904040] Freezing user space processes completed (elapsed 0.093 
seconds)
[ 2589.914322] Freeing remaining freezable tasks todo:13 retry:0
[ 2589.925185] Freeing remaining freezable tasks todo:0 retry:1
[ 2589.925191] Freezing remaining freezable tasks completed (elapsed 
0.021 seconds)
[ 2594.088171] Filesystems sync: 0.013 seconds
[ 2594.145012] Freeing user space processes todo:15947 retry:0
[ 2594.175153] Freeing user space processes todo:1521 retry:1
[ 2594.187060] Freeing user space processes todo:0 retry:2
[ 2594.187071] Freezing user space processes completed (elapsed 0.098 
seconds)
[ 2594.199270] Freeing remaining freezable tasks todo:13 retry:0
[ 2594.215446] Freeing remaining freezable tasks todo:0 retry:1
[ 2594.215468] Freezing remaining freezable tasks completed (elapsed 
0.028 seconds)

However, in the last suspend cycle, we do not execute the fork script 
and the result is quite different:

[ 2678.840809] Filesystems sync: 0.010 seconds
[ 2678.928107] Freeing user space processes todo:16673 retry:0
[ 2678.950744] Freeing user space processes todo:0 retry:1
[ 2678.950759] Freezing user space processes completed (elapsed 0.109 
seconds)
[ 2678.971389] Freeing remaining freezable tasks todo:13 retry:0
[ 2678.996021] Freeing remaining freezable tasks todo:0 retry:1
[ 2678.996043] Freezing remaining freezable tasks completed (elapsed 
0.045 seconds)

(include the one with only 1 retry, e.g.:

[ 2678.928107] Freeing user space processes todo:16673 retry:0
[ 2678.950744] Freeing user space processes todo:0 retry:1

This pattern is repeatable: when fork is allowed during the 
freeze/suspend window, we consistently hit multiple retries; when fork 
is disabled during that time, the freeze proceeds quickly and smoothly 
with just one retry.

This indicates that new user processes created after the freezing loop 
begins are interfering with the suspend, which is consistent with a fork 
escape scenario. Since the current code doesn’t prevent forks once 
tasklist_lock is released, a new child process can be created and escape 
freezing altogether — leading to the need for retries and sometimes 
suspend failure.

Hope this helps clarify the issue. Happy to provide further logs or 
testing as needed.