[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6350624c-7293-43de-8788-e52a236d91fb@kylinos.cn>
Date: Sun, 8 Jun 2025 15:22:20 +0800
From: zhangzihuan <zhangzihuan@...inos.cn>
To: David Hildenbrand <david@...hat.com>, rafael@...nel.org,
len.brown@...el.com, pavel@...nel.org, kees@...nel.org, mingo@...hat.com,
peterz@...radead.org, juri.lelli@...hat.com, vincent.guittot@...aro.org,
dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, vschneid@...hat.com, akpm@...ux-foundation.org,
lorenzo.stoakes@...cle.com, Liam.Howlett@...cle.com, vbabka@...e.cz,
rppt@...nel.org, surenb@...gle.com, mhocko@...e.com
Cc: linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [RFC PATCH] PM: Optionally block user fork during freeze to
improve performance
Hi David,
Thanks for your feedback!
在 2025/6/6 15:20, David Hildenbrand 写道:
> Hi,
>
> On 06.06.25 08:25, Zihuan Zhang wrote:
>> Currently, the freezer traverses all tasks to freeze them during
>> system suspend or hibernation. If a user process forks during this
>> window, the new child may escape freezing and require a second
>> traversal or retry, adding non-trivial overhead.
>>
>> This patch introduces a CONFIG_PM_DISABLE_USER_FORK_DURING_FREEZE
>
> Not sure if a Kconfig is really the right choice here ...
>
I understand your concern. My initial thinking was to provide an opt-in
configuration so that platforms sensitive to resume performance (or
under constrained suspend time budgets) can selectively enable this
behavior.
However, I agree that a runtime mechanism or a default-on behavior gated
by suspend state might be cleaner. I'm happy to rework it in that
direction — e.g., based on pm_freezing or a similar runtime flag.
>> option. When enabled, it prevents user processes from creating new
>> processes (via fork/clone) during the freezing period. This guarantees
>> a stable task list and avoids re-traversing the process list due to
>> late-created user tasks, thereby improving performance.
>
> Any performance numbers to back your claims?
>
We’ve completed the performance testing. To simulate a process escape
scenario, we created a test environment where a large number of fork
operations are triggered right before the freeze phase begins.
A few details worth mentioning:
• We avoided creating too many processes at once to prevent
resource exhaustion.
• To increase the likelihood of hitting the freeze window
precisely, we skipped the filesystem freezing time during the simulation.
• We also added a small delay to each process, ensuring they
don’t all complete their fork operations before the system enters
suspend/hibernate.
• Before starting the tests, we also added a debug print in
try_to_freeze_task() to log the number of freeze retry attempts for each
task
--- begin test code ---
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
#include <sys/types.h>
#include <errno.h>
#include <string.h>
#include <time.h>
#define TOTAL_FORKS 1000 // total number
#define BATCH_SIZE 10 // Number of forks per batch
#define FORK_INTERVAL_US 300 // Each fork interval (microseconds)
#define CHILD_LIFETIME_SEC 60 // Subprocess runtime (seconds)
void usleep_precise(int usec) {
struct timespec ts;
ts.tv_sec = usec / 1000000;
ts.tv_nsec = (usec % 1000000) * 1000;
nanosleep(&ts, NULL);
}
void random_delay_ms(int min_ms, int max_ms) {
int delay = min_ms + rand() % (max_ms - min_ms + 1);
usleep_precise(delay * 1000);
}
void random_delay_us(int min_us, int max_us) {
int delay = min_us + rand() % (max_us - min_us + 1);
usleep_precise(delay);
}
int main() {
int count = 0;
int batch = 0;
printf("Starting enhanced fork storm test...\n");
printf(" Total forks: %d\n", TOTAL_FORKS);
printf(" Batch size: %d\n", BATCH_SIZE);
printf(" Fork interval: %d us\n", FORK_INTERVAL_US);
printf(" Child lifetime: %d sec\n\n", CHILD_LIFETIME_SEC);
random_delay_ms(2, 10); // Skip Filesystem freeze
while (count < TOTAL_FORKS) {
printf("Starting batch %d...\n", ++batch);
for (int i = 0; i < BATCH_SIZE && count < TOTAL_FORKS; i++,
count++) {
pid_t pid = fork();
if (pid == 0) {
printf("Child #%d (pid=%d) started\n", count, getpid());
// sleep(CHILD_LIFETIME_SEC);
pause();
exit(0);
} else if (pid < 0) {
fprintf(stderr, "fork failed at %d: %s\n", count,
strerror(errno));
exit(1);
}
}
usleep_precise(50);
printf("Batch %d completed. Total forked so far: %d\n", batch,
count);
}
printf("All %d children created. Parent process sleeping...\n",
TOTAL_FORKS);
pause();
return 0;
}
--- end test code ---
Then compile the code and run the test script.
gcc -o slow_fork slow_fork.c
--- begin test code ---
#!/bin/bash
LOOPS=20
DELAY_BETWEEN_RUNS=1
NUM_FORKS_PER_ROUND=10
FORK_PIDS=()
echo freezer > /sys/power/pm_test
echo 3 > /sys/module/suspend/parameters/pm_test_delay
for ((i=1; i<=LOOPS; i++)); do
echo "===== Test round $i/$LOOPS ====="
FORK_PIDS=()
for ((j=1; j<=NUM_FORKS_PER_ROUND; j++)); do
./slow_fork &
FORK_PIDS+=($!)
echo " Launched slow_fork #$j (pid=${FORK_PIDS[-1]})"
done
echo mem > /sys/power/state
for pid in "${FORK_PIDS[@]}"; do
kill "$pid" 2>/dev/null
done
for pid in "${FORK_PIDS[@]}"; do
wait "$pid" 2>/dev/null
done
echo "Round $i complete. Waiting ${DELAY_BETWEEN_RUNS}s..."
sleep $DELAY_BETWEEN_RUNS
done
pkill slow_fork
echo "==== All $LOOPS rounds complete ===="
}
--- end test code ---
The result like this:
dmesg | grep -E 'elap|Files|retry'
[ 585.255784] Filesystems sync: 0.010 seconds
[ 585.261620] Freezing user space processes completed (elapsed 0.005
seconds)
[ 585.263530] Freezing remaining freezable tasks completed (elapsed
0.001 seconds)
[ 589.323691] Filesystems sync: 0.012 seconds
[ 589.336983] Freeing user space processes todo:0 retry:2
[ 589.336996] Freezing user space processes completed (elapsed 0.013
seconds)
[ 589.342628] Freezing remaining freezable tasks completed (elapsed
0.005 seconds)
[ 593.424317] Filesystems sync: 0.011 seconds
[ 593.446210] Freeing user space processes todo:0 retry:2
[ 593.446227] Freezing user space processes completed (elapsed 0.021
seconds)
[ 593.454303] Freezing remaining freezable tasks completed (elapsed
0.008 seconds)
[ 597.528491] Filesystems sync: 0.012 seconds
[ 597.561179] Freeing user space processes todo:0 retry:2
[ 597.561200] Freezing user space processes completed (elapsed 0.032
seconds)
[ 597.570157] Freezing remaining freezable tasks completed (elapsed
0.008 seconds)
[ 601.645391] Filesystems sync: 0.010 seconds
[ 601.682653] Freeing user space processes todo:0 retry:2
[ 601.682671] Freezing user space processes completed (elapsed 0.037
seconds)
[ 601.694401] Freezing remaining freezable tasks completed (elapsed
0.011 seconds)
[ 605.789844] Filesystems sync: 0.011 seconds
[ 605.830030] Freezing user space processes completed (elapsed 0.039
seconds)
[ 605.843602] Freezing remaining freezable tasks completed (elapsed
0.013 seconds)
[ 609.942143] Filesystems sync: 0.017 seconds
[ 609.997859] Freeing user space processes todo:0 retry:2
[ 609.997875] Freezing user space processes completed (elapsed 0.055
seconds)
[ 610.016413] Freezing remaining freezable tasks completed (elapsed
0.018 seconds)
[ 614.123700] Filesystems sync: 0.016 seconds
[ 614.187743] Freeing user space processes todo:0 retry:2
[ 614.187764] Freezing user space processes completed (elapsed 0.063
seconds)
[ 614.205004] Freezing remaining freezable tasks completed (elapsed
0.017 seconds)
[ 618.323268] Filesystems sync: 0.013 seconds
[ 618.393868] Freeing user space processes todo:0 retry:2
[ 618.393886] Freezing user space processes completed (elapsed 0.070
seconds)
[ 618.413420] Freezing remaining freezable tasks completed (elapsed
0.019 seconds)
[ 622.584589] Filesystems sync: 0.009 seconds
[ 622.676274] Freeing user space processes todo:0 retry:2
[ 622.676294] Freezing user space processes completed (elapsed 0.091
seconds)
[ 622.702762] Freezing remaining freezable tasks completed (elapsed
0.026 seconds)
[ 626.836610] Filesystems sync: 0.009 seconds
[ 626.935583] Freeing user space processes todo:0 retry:2
[ 626.935603] Freezing user space processes completed (elapsed 0.098
seconds)
[ 626.966460] Freezing remaining freezable tasks completed (elapsed
0.030 seconds)
[ 631.131669] Filesystems sync: 0.010 seconds
[ 631.249412] Freeing user space processes todo:0 retry:2
[ 631.249432] Freezing user space processes completed (elapsed 0.117
seconds)
[ 631.283333] Freezing remaining freezable tasks completed (elapsed
0.033 seconds)
[ 635.459169] Filesystems sync: 0.014 seconds
[ 635.574913] Freeing user space processes todo:0 retry:2
[ 635.574928] Freezing user space processes completed (elapsed 0.115
seconds)
[ 635.613557] Freezing remaining freezable tasks completed (elapsed
0.038 seconds)
[ 639.801842] Filesystems sync: 0.014 seconds
[ 639.949023] Freeing user space processes todo:0 retry:2
[ 639.949047] Freezing user space processes completed (elapsed 0.146
seconds)
[ 639.998032] Freezing remaining freezable tasks completed (elapsed
0.048 seconds)
[ 644.151229] Filesystems sync: 0.011 seconds
[ 644.303744] Freeing user space processes todo:0 retry:2
[ 644.303765] Freezing user space processes completed (elapsed 0.152
seconds)
[ 644.347925] Freezing remaining freezable tasks completed (elapsed
0.043 seconds)
[ 648.506472] Filesystems sync: 0.010 seconds
[ 648.647752] Freeing user space processes todo:192 retry:2
[ 648.670978] Freeing user space processes todo:0 retry:3
[ 648.670997] Freezing user space processes completed (elapsed 0.164
seconds)
[ 648.724734] Freezing remaining freezable tasks completed (elapsed
0.053 seconds)
[ 652.947466] Filesystems sync: 0.021 seconds
[ 653.112034] Freeing user space processes todo:0 retry:2
[ 653.112055] Freezing user space processes completed (elapsed 0.164
seconds)
[ 653.163845] Freezing remaining freezable tasks completed (elapsed
0.051 seconds)
[ 657.364792] Filesystems sync: 0.012 seconds
[ 657.510491] Freezing user space processes completed (elapsed 0.145
seconds)
[ 657.570268] Freezing remaining freezable tasks completed (elapsed
0.059 seconds)
[ 661.779728] Filesystems sync: 0.011 seconds
[ 661.975654] Freeing user space processes todo:0 retry:2
[ 661.975686] Freezing user space processes completed (elapsed 0.195
seconds)
[ 662.050074] Freezing remaining freezable tasks completed (elapsed
0.074 seconds)
[ 666.273377] Filesystems sync: 0.010 seconds
[ 666.481081] Freeing user space processes todo:0 retry:2
[ 666.481117] Freezing user space processes completed (elapsed 0.207
seconds)
[ 666.564340] Freezing remaining freezable tasks completed (elapsed
0.083 seconds)
We observed the following log during one of the test runs:
[ 648.647752] Freeing user space processes todo:192 retry:2
However, since the reproduction rate is currently low, it's still
difficult to quantify exactly how much performance improvement the patch
brings.
>>
>> The restriction is only active during the window when the system is
>> freezing user tasks. Once all tasks are frozen, or if the system aborts
>> the suspend/hibernate process, the restriction is lifted.
>> No kernel threads are affected, and kernel_create_* functions remain
>> unrestricted.
>>
>> Signed-off-by: Zihuan Zhang <zhangzihuan@...inos.cn>
>> ---
>> include/linux/suspend.h | 8 ++++++++
>> kernel/fork.c | 6 ++++++
>> kernel/power/Kconfig | 10 ++++++++++
>> kernel/power/main.c | 44 +++++++++++++++++++++++++++++++++++++++++
>> kernel/power/power.h | 4 ++++
>> kernel/power/process.c | 7 +++++++
>> 6 files changed, 79 insertions(+)
>>
>> diff --git a/include/linux/suspend.h b/include/linux/suspend.h
>> index b1c76c8f2c82..2dd8b3eb50f0 100644
>> --- a/include/linux/suspend.h
>> +++ b/include/linux/suspend.h
>> @@ -591,4 +591,12 @@ enum suspend_stat_step {
>> void dpm_save_failed_dev(const char *name);
>> void dpm_save_failed_step(enum suspend_stat_step step);
>> +#ifdef CONFIG_PM_DISABLE_USER_FORK_DURING_FREEZE
>> +extern bool pm_block_user_fork;
>> +bool pm_should_block_fork(void);
>> +bool pm_freeze_process_in_progress(void);
>> +#else
>> +static inline bool pm_should_block_fork(void) { return false; };
>> +static inline bool pm_freeze_process_in_progress(void) { return
>> false; };
>> +#endif /* CONFIG_PM_DISABLE_USER_FORK_DURING_FREEZE */
>> #endif /* _LINUX_SUSPEND_H */
>> diff --git a/kernel/fork.c b/kernel/fork.c
>> index 1ee8eb11f38b..b0bd0206b644 100644
>> --- a/kernel/fork.c
>> +++ b/kernel/fork.c
>> @@ -105,6 +105,7 @@
>> #include <uapi/linux/pidfd.h>
>> #include <linux/pidfs.h>
>> #include <linux/tick.h>
>> +#include <linux/suspend.h>
>> #include <asm/pgalloc.h>
>> #include <linux/uaccess.h>
>> @@ -2596,6 +2597,11 @@ pid_t kernel_clone(struct kernel_clone_args
>> *args)
>> trace = 0;
>> }
>> +#ifdef CONFIG_PM_DISABLE_USER_FORK_DURING_FREEZE
>> + if (pm_should_block_fork() && !(current->flags & PF_KTHREAD))
>> + return -EBUSY;
>> +#endif
>
You're absolutely right — returning -EBUSY is not part of the documented
interface for fork/clone3, and user space libraries like glibc are
likely not prepared to handle that gracefully.
One alternative could be to block in kernel_clone() until freezing ends,
instead of returning an error. That way, fork() would not fail, just
potentially block briefly (similar to memory pressure or cgroup limits).
Do you think that's more acceptable?
I’ll draft an updated version reflecting your suggestions. Really
appreciate your time and review!
Best regards,
Zihuan Zhang
> fork() is not documented to return EBUSY and for clone3() it's
> documented to only happen in specific cases.
>
> So user space is not prepared for that.
>
Powered by blists - more mailing lists