[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <6b28003b-58ee-4870-ade6-c488148a7b4f@kylinos.cn>
Date: Tue, 8 Jul 2025 08:56:27 +0800
From: Zihuan Zhang <zhangzihuan@...inos.cn>
To: Peter Zijlstra <peterz@...radead.org>,
"Rafael J. Wysocki" <rafael@...nel.org>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>, pavel@...nel.org,
len.brown@...el.com, linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org,
Oleg Nesterov <oleg@...hat.com>
Subject: Re: [PATCH v3 1/1] PM / Freezer: Skip zombie/dead processes to
在 2025/7/7 16:42, Peter Zijlstra 写道:
> A quick browse through the code seems to suggest that for user tasks,
> PF_NOFREEZE is set just like exit_state, once at death.
>
I couldn’t agree more — for user tasks, PF_NOFREEZE is indeed set at the
same time as exit_state, right at death.
> For kernel threads the situation is a little more complex; but typically
> a kthread is spawned with PF_NOFREEZE set, and then some will clear it
> again, but always before then calling a TASK_FREEZABLE wait.
While that’s generally the expected pattern, it depends on each kthread
correctly managing the PF_NOFREEZE flag before entering a TASK_FREEZABLE
wait.
This assumption can be fragile in practice — a missed update or
unconventional usage could lead to inconsistent freezing behavior.
> The only thing I didn't fully investigate is this
> {,un}lock_system_sleep() thing. But that would appear to need at least
> the below fixlet.
>
> diff --git a/kernel/power/main.c b/kernel/power/main.c index
> 3d484630505a..a415e7d30a2c 100644 --- a/kernel/power/main.c +++
> b/kernel/power/main.c @@ -52,8 +52,8 @@ void pm_restrict_gfp_mask(void) unsigned int lock_system_sleep(void)
> {
> unsigned int flags = current->flags;
> - current->flags |= PF_NOFREEZE; mutex_lock(&system_transition_mutex);
> + current->flags |= PF_NOFREEZE; return flags;
> }
> EXPORT_SYMBOL_GPL(lock_system_sleep);
It seems to me that setting PF_NOFREEZE before acquiring
system_transition_mutex might be intentional — possibly to prevent
deadlocks.
If the task were to be frozen while holding or waiting for the mutex, it
could block suspend or resume paths. So changing the order may risk
breaking that protection.
So, although PF_NOFREEZE could be the better long-term solution, right
now depending exclusively on it might cause issues.
It would require further standardization and guarantees about the flag’s
stability during the freezing process before we can fully rely on it.
I’m looking forward to your thoughts on this.
>
> Anyway, this seems to suggest something relatively simple like this here
> should do:
>
> diff --git a/kernel/freezer.c b/kernel/freezer.c index
> 8d530d0949ff..8b7cecd17564 100644 --- a/kernel/freezer.c +++
> b/kernel/freezer.c @@ -162,20 +162,22 @@ static bool
> __freeze_task(struct task_struct *p) */
> bool freeze_task(struct task_struct *p)
> {
> - unsigned long flags; - - spin_lock_irqsave(&freezer_lock, flags); -
> if (!freezing(p) || frozen(p) || __freeze_task(p)) { -
> spin_unlock_irqrestore(&freezer_lock, flags); + /* + * User tasks get
> NOFREEZE in do_task_dead(). + */ + if ((p->flags & (PF_NOFREEZE |
> PF_KTHREAD)) == PF_NOFREEZE) return false;
> - }
> - if (!(p->flags & PF_KTHREAD)) - fake_signal_wake_up(p); - else -
> wake_up_state(p, TASK_NORMAL); + scoped_guard (spinlock_irqsave,
> &freezer_lock) { + if (!freezing(p) || frozen(p) || __freeze_task(p))
> + return false; + + if (!(p->flags & PF_KTHREAD)) +
> fake_signal_wake_up(p); + else + wake_up_state(p, TASK_NORMAL); + }
> - spin_unlock_irqrestore(&freezer_lock, flags); return true;
> }
Thanks for the suggestion — this looks really clean and simplifies the
logic nicely! The use of a scoped spinlock and the early return based on
PF_NOFREEZE | PF_KTHREAD makes the flow easier to follow.
By the way, in the code above, since for user tasks the PF_NOFREEZE flag
is only set once at death (similar to how exit_state is handled), would
it make sense to check p->exit_state directly here instead?
It seems semantically equivalent for user tasks, and exit_state might be
more explicit in conveying the task's lifecycle state. I'm curious if
there's a specific reason to prefer PF_NOFREEZE over exit_state in this
case.
Best regards,
Zihuan Zhang
Powered by blists - more mailing lists