linux-kernel - Re: [RFC PATCH v1 0/9] freezer: Introduce freeze priority model to address process dependency issues

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d86a9883-9d2e-4bb2-a93d-0d95b4a60e5f@kylinos.cn>
Date: Tue, 12 Aug 2025 13:57:49 +0800
From: Zihuan Zhang <zhangzihuan@...inos.cn>
To: Michal Hocko <mhocko@...e.com>, Theodore Ts'o <tytso@....edu>,
 Jan Kara <jack@...e.com>
Cc: "Rafael J . Wysocki" <rafael@...nel.org>,
 Peter Zijlstra <peterz@...radead.org>, Oleg Nesterov <oleg@...hat.com>,
 David Hildenbrand <david@...hat.com>, Jonathan Corbet <corbet@....net>,
 Ingo Molnar <mingo@...hat.com>, Juri Lelli <juri.lelli@...hat.com>,
 Vincent Guittot <vincent.guittot@...aro.org>,
 Dietmar Eggemann <dietmar.eggemann@....com>,
 Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>,
 Mel Gorman <mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>,
 len brown <len.brown@...el.com>, pavel machek <pavel@...nel.org>,
 Kees Cook <kees@...nel.org>, Andrew Morton <akpm@...ux-foundation.org>,
 Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
 "Liam R . Howlett" <Liam.Howlett@...cle.com>,
 Vlastimil Babka <vbabka@...e.cz>, Mike Rapoport <rppt@...nel.org>,
 Suren Baghdasaryan <surenb@...gle.com>,
 Catalin Marinas <catalin.marinas@....com>, Nico Pache <npache@...hat.com>,
 xu xin <xu.xin16@....com.cn>, wangfushuai <wangfushuai@...du.com>,
 Andrii Nakryiko <andrii@...nel.org>, Christian Brauner <brauner@...nel.org>,
 Thomas Gleixner <tglx@...utronix.de>, Jeff Layton <jlayton@...nel.org>,
 Al Viro <viro@...iv.linux.org.uk>, Adrian Ratiu
 <adrian.ratiu@...labora.com>, linux-pm@...r.kernel.org, linux-mm@...ck.org,
 linux-fsdevel@...r.kernel.org, linux-doc@...r.kernel.org,
 linux-kernel@...r.kernel.org, linux-ext4@...r.kernel.org
Subject: Re: [RFC PATCH v1 0/9] freezer: Introduce freeze priority model to
 address process dependency issues

Hi all,

We encountered an issue where the number of freeze retries increased due 
to processes stuck in D state. The logs point to jbd2-related activity.

log1:

6616.650482] task:ThreadPoolForeg state:D stack:0     pid:262026
tgid:4065  ppid:2490   task_flags:0x400040 flags:0x00004004
[ 6616.650485] Call Trace:
[ 6616.650486]  <TASK>
[ 6616.650489]  __schedule+0x532/0xea0
[ 6616.650494]  schedule+0x27/0x80
[ 6616.650496]  jbd2_log_wait_commit+0xa6/0x120
[ 6616.650499]  ? __pfx_autoremove_wake_function+0x10/0x10
[ 6616.650502]  ext4_sync_file+0x1ba/0x380
[ 6616.650505]  do_fsync+0x3b/0x80

log2:

[  631.206315] jdb2_log_wait_log_commit  completed (elapsed 0.002 seconds)
[  631.215325] jdb2_log_wait_log_commit  completed (elapsed 0.001 seconds)
[  631.240704] jdb2_log_wait_log_commit  completed (elapsed 0.386 seconds)
[  631.262167] Filesystems sync: 0.424 seconds
[  631.262821] Freezing user space processes
[  631.263839] freeze round: 1, task to freeze: 852
[  631.265128] freeze round: 2, task to freeze: 2
[  631.267039] freeze round: 3, task to freeze: 2
[  631.271176] freeze round: 4, task to freeze: 2
[  631.279160] freeze round: 5, task to freeze: 2
[  631.287152] freeze round: 6, task to freeze: 2
[  631.295346] freeze round: 7, task to freeze: 2
[  631.301747] freeze round: 8, task to freeze: 2
[  631.309346] freeze round: 9, task to freeze: 2
[  631.317353] freeze round: 10, task to freeze: 2
[  631.325348] freeze round: 11, task to freeze: 2
[  631.333353] freeze round: 12, task to freeze: 2
[  631.341358] freeze round: 13, task to freeze: 2
[  631.349357] freeze round: 14, task to freeze: 2
[  631.357363] freeze round: 15, task to freeze: 2
[  631.365361] freeze round: 16, task to freeze: 2
[  631.373379] freeze round: 17, task to freeze: 2
[  631.381366] freeze round: 18, task to freeze: 2
[  631.389365] freeze round: 19, task to freeze: 2
[  631.397371] freeze round: 20, task to freeze: 2
[  631.405373] freeze round: 21, task to freeze: 2
[  631.413373] freeze round: 22, task to freeze: 2
[  631.421392] freeze round: 23, task to freeze: 1
[  631.429948] freeze round: 24, task to freeze: 1
[  631.438295] freeze round: 25, task to freeze: 1
[  631.444546] jdb2_log_wait_log_commit  completed (elapsed 0.249 seconds)
[  631.446387] freeze round: 26, task to freeze: 0
[  631.446390] Freezing user space processes completed (elapsed 0.183 
seconds)
[  631.446392] OOM killer disabled.
[  631.446393] Freezing remaining freezable tasks
[  631.446656] freeze round: 1, task to freeze: 4
[  631.447976] freeze round: 2, task to freeze: 0
[  631.447978] Freezing remaining freezable tasks completed (elapsed 
0.001 seconds)
[  631.447980] PM: suspend debug: Waiting for 1 second(s).
[  632.450858] OOM killer enabled.
[  632.450859] Restarting tasks: Starting
[  632.453140] Restarting tasks: Done
[  632.453173] random: crng reseeded on system resumption
[  632.453370] PM: suspend exit
[  632.462799] jdb2_log_wait_log_commit  completed (elapsed 0.000 seconds)
[  632.466114] jdb2_log_wait_log_commit  completed (elapsed 0.001 seconds)

This is the reason:

[  631.444546] jdb2_log_wait_log_commit  completed (elapsed 0.249 seconds)


During freezing, user processes executing jbd2_log_wait_commit enter D 
state because this function calls wait_event and can take tens of 
milliseconds to complete. This long execution time, coupled with 
possible competition with the freezer, causes repeated freeze retries.

While we understand that jbd2 is a freezable kernel thread, we would 
like to know if there is a way to freeze it earlier or freeze some 
critical processes proactively to reduce this contention.

Thanks for your input and suggestions.

在 2025/8/11 18:58, Michal Hocko 写道:
> On Mon 11-08-25 17:13:43, Zihuan Zhang wrote:
>> 在 2025/8/8 16:58, Michal Hocko 写道:
> [...]
>>> Also the interface seems to be really coarse grained and it can easily
>>> turn out insufficient for other usecases while it is not entirely clear
>>> to me how this could be extended for those.
>>   We recognize that the current interface is relatively coarse-grained and
>> may not be sufficient for all scenarios. The present implementation is a
>> basic version.
>>
>> Our plan is to introduce a classification-based mechanism that assigns
>> different freeze priorities according to process categories. For example,
>> filesystem and graphics-related processes will be given higher default
>> freeze priority, as they are critical in the freezing workflow. This
>> classification approach helps target important processes more precisely.
>>
>> However, this requires further testing and refinement before full
>> deployment. We believe this incremental, category-based design will make the
>> mechanism more effective and adaptable over time while keeping it
>> manageable.
> Unless there is a clear path for a more extendable interface then
> introducing this one is a no-go. We do not want to grow different ways
> to establish freezing policies.
>
> But much more fundamentally. So far I haven't really seen any argument
> why different priorities help with the underlying problem other than the
> timing might be slightly different if you change the order of freezing.
> This to me sounds like the proposed scheme mostly works around the
> problem you are seeing and as such is not a really good candidate to be
> merged as a long term solution. Not to mention with a user API that
> needs to be maintained for ever.
>
> So NAK from me on the interface.
>
Thanks for the feedback. I understand your concern that changing the 
freezer priority order looks like working around the symptom rather than 
solving the root cause.

Since the last discussion, we have analyzed the D-state processes 
further and identified that the long wait time is caused by 
jbd2_log_wait_commit. This wait happens because user tasks call into 
this function during fsync/fdatasync and it can take tens of 
milliseconds to complete. When this coincides with the freezer 
operation, the tasks are stuck in D state and retried multiple times, 
increasing the total freeze time.

Although we know that jbd2 is a freezable kernel thread, we are 
exploring whether freezing it earlier — or freezing certain key 
processes first — could reduce this contention and improve freeze 
completion time.


>>> I believe it would be more useful to find sources of those freezer
>>> blockers and try to address those. Making more blocked tasks
>>> __set_task_frozen compatible sounds like a general improvement in
>>> itself.
>> we have already identified some causes of D-state tasks, many of which are
>> related to the filesystem. On some systems, certain processes frequently
>> execute ext4_sync_file, and under contention this can lead to D-state tasks.
> Please work with maintainers of those subsystems to find proper
> solutions.

We’ve pulled in the jbd2 maintainer to get feedback on whether changing 
the freeze ordering for jbd2 is safe or if there’s a better approach to 
avoid the repeated retries caused by this wait.