[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aXoWiJhcOaGGlcmk@pathway.suse.cz>
Date: Wed, 28 Jan 2026 15:00:40 +0100
From: Petr Mladek <pmladek@...e.com>
To: ysard <ysard_git@....fr>
Cc: John Ogness <john.ogness@...utronix.de>, linux-kernel@...r.kernel.org,
senozhatsky@...omium.org
Subject: Re: Regression: system freeze on resume from suspend introduced by
printk per-console suspended state
On Sat 2026-01-24 02:22:41, ysard wrote:
> On Fri 2026-01-23 13:19:34 +0100, Petr Mladek wrote:
> > Also I would expect that the userspace waits until the services
> > finish the job before suspending the kernel.
>
> It does:
>
> janv. 24 00:33:41 systemd[1]: Reached target sleep.target - Sleep.
> janv. 24 00:33:41 systemd[1]: Starting nvidia-suspend.service - NVIDIA system suspend actions...
> janv. 24 00:33:41 suspend[51525]: nvidia-suspend.service
> janv. 24 00:33:41 logger[51525]: <13>Jan 24 00:33:41 suspend: nvidia-suspend.service
> janv. 24 00:33:42 kernel: audit: type=1400 audit(1769211222.373:2351): apparmor="ALLOWED" operation="open" class="file" profile="Xorg" name="/dev/nvidiactl" pid=1441 comm="Xorg" requested_mask="wr" denied_mask="wr" fsuid=0 ouid=0
> janv. 24 00:33:42 kernel: audit: type=1400 audit(1769211222.969:2352): apparmor="ALLOWED" operation="open" class="file" profile="Xorg" name="/dev/nvidiactl" pid=1441 comm="Xorg" requested_mask="wr" denied_mask="wr" fsuid=0 ouid=0
> janv. 24 00:33:45 systemd[1]: nvidia-suspend.service: Deactivated successfully.
> janv. 24 00:33:45 systemd[1]: Finished nvidia-suspend.service - NVIDIA system suspend actions.
> janv. 24 00:33:45 systemd[1]: Starting systemd-suspend.service - System Suspend...
> janv. 24 00:33:45 systemd[1]: session-1.scope: Unit now frozen-by-parent.
> janv. 24 00:33:45 systemd[1]: user@...0.service: Unit now frozen-by-parent.
> janv. 24 00:33:45 systemd[1]: user-1000.slice: Unit now frozen-by-parent.
> janv. 24 00:33:45 systemd[1]: user.slice: Unit now frozen.
> janv. 24 00:33:45 systemd-sleep[51562]: Successfully froze unit 'user.slice'.
> janv. 24 00:33:45 systemd-sleep[51562]: Performing sleep operation 'suspend'...
> janv. 24 00:33:45 kernel: PM: suspend entry (deep)
OK.
> Yes I have a reproducible pattern here. With the service disabled.
> The service `nvidia-resume.service` (which basically calls the script
> with the 'resume' argument) is expected to start if the resume is
> completed, but the system does not reach this stage during the freeze.
>
> No freeze:
> $ sudo sh -c "
> mkdir -p /var/run/nvidia-sleep \
> && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \
> && chvt 63 \
> && systemctl suspend"
>
> Freeze:
> $ sudo sh -c "
> mkdir -p /var/run/nvidia-sleep \
> && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \
> && chvt 63 \
> && echo suspend >/proc/driver/nvidia/suspend \
> && systemctl suspend"
>
> So the problem is related to this command:
> $ echo suspend >/proc/driver/nvidia/suspend
>
> Note that without the systemctl order this command suspends and wakes up the gpu correctly:
> $ sudo sh -c "
> chvt 63 \
> && echo suspend >/proc/driver/nvidia/suspend; \
> sleep 4; \
> echo resume >/proc/driver/nvidia/suspend; \
> chvt 2"
Interesting. It looks like the nvidia suspend does something which
breaks the system suspend. But the driver is able to revert it...
To be honest, I do not have any theory which could explain this.
But I have found a bug in John's debug patch from
https://lore.kernel.org/all/877bts1ltv.fsf@jogness.linutronix.de/
The patch tried to restore the original behavior on current mainline.
But console_suspend()/cosnole_resume() function have been renamed recently
to console_suspend_all()/console_resume_all(). The original
names were used for console-specific suspend/resume variants,
see
https://lore.kernel.org/all/20250226-printk-renaming-v1-0-0b878577f2e6@suse.com/
Also the debug patch did not revert synchronize_srcu(). I guess that
this was intentional. But I would rather revert it as well because
it is a potentially blocking operation.
Could you please test it with this fixed version of the debug patch?
If the patch helps, by chance, then please try to uncomment
the synchronize_srcu() calls and check if it still works.
I wonder if they make in difference.
>From a36b57cbcb239e7e5af4fb8278690cd4965d6fc0 Mon Sep 17 00:00:00 2001
From: John Ogness <john.ogness@...utronix.de>
Date: Thu, 8 Jan 2026 10:49:24 +0106
Subject: [DEBUG v2] printk: Debug new vs. old suspend/resume behavior
This is just for debugging. It should restore the old console_lock
behavior for suspend/resume and also adds some debugging information.
Please compile with CONFIG_PRINTK_CALLER=y so that we can see which
tasks are locking/unlocking the console during suspend/resume.
Changes against v1:
- Set/Clear the global "console_suspended" variable in
console_suspend_all()/console_restore_all() instead of
console_suspend()/console_resume().
The functions have been renamed recently, see
https://lore.kernel.org/all/20250226-printk-renaming-v1-0-0b878577f2e6@suse.com/
- Do not call synchronize_srcu() in the suspend/resume functions.
They are another potentially blocking operation added by
the problematic commit 9e70a5e109a4a2336 ("printk: Add per-console
suspended state").
Signed-off-by: John Ogness <john.ogness@...utronix.de>
Signed-off-by: Petr Mladek <pmladek@...e.com>
---
kernel/printk/printk.c | 64 ++++++++++++++++++++++++++++++++++++++++--
1 file changed, 62 insertions(+), 2 deletions(-)
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 1d765ad242b8..23fddc4006d3 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -356,6 +356,22 @@ static void __up_console_sem(unsigned long ip)
*/
static int console_locked;
+static int console_suspended;
+
+int vprintk_store(int facility, int level,
+ const struct dev_printk_info *dev_info,
+ const char *fmt, va_list args);
+
+/* Helper function to store-only. */
+static void printk_store(const char *fmt, ...)
+{
+ va_list args;
+
+ va_start(args, fmt);
+ vprintk_store(0, LOGLEVEL_DEFAULT, NULL, fmt, args);
+ va_end(args);
+}
+
/*
* Array of consoles built from command line options (console=)
*/
@@ -2748,6 +2764,12 @@ void console_suspend_all(void)
if (!console_suspend_enabled)
return;
+ console_lock();
+ console_suspended = 1;
+ printk_store(KERN_INFO "printk: %s\n", __func__);
+ /* Unlock directly (i.e. without clearing @console_locked). */
+ up_console_sem();
+
console_list_lock();
for_each_console(con)
console_srcu_write_flags(con, con->flags | CON_SUSPENDED);
@@ -2759,7 +2781,7 @@ void console_suspend_all(void)
* is guaranteed that all printing has stopped when this function
* completes.
*/
- synchronize_srcu(&console_srcu);
+// synchronize_srcu(&console_srcu);
}
void console_resume_all(void)
@@ -2785,7 +2807,17 @@ void console_resume_all(void)
* contexts must be able to see they are no longer suspended so
* that they are guaranteed to wake up and resume printing.
*/
- synchronize_srcu(&console_srcu);
+// synchronize_srcu(&console_srcu);
+
+ down_console_sem();
+ printk_store(KERN_INFO "printk: %s\n", __func__);
+ console_suspended = 0;
+ /*
+ * Perform a regular unlock.
+ * Here console_locked=1 and console_may_schedule=1.
+ * @console_unlocked will be cleared.
+ */
+ console_unlock();
}
printk_get_console_flush_type(&ft);
@@ -2841,6 +2873,15 @@ void console_lock(void)
msleep(1000);
down_console_sem();
+ if (console_suspended) {
+ printk_store(KERN_INFO "printk: %s\n", __func__);
+ /*
+ * Keep console locked, but do not touch
+ * @console_locked or @console_may_schedule.
+ * (Although they will both be 1 here anyway.)
+ */
+ return;
+ }
console_locked = 1;
console_may_schedule = 1;
}
@@ -2861,6 +2902,15 @@ int console_trylock(void)
return 0;
if (down_trylock_console_sem())
return 0;
+ if (console_suspended) {
+ printk_store(KERN_INFO "printk: %s\n", __func__);
+ /*
+ * The lock was acquired, but unlock directly and report
+ * failure. Here console_locked=1 and console_may_schedule=1.
+ */
+ up_console_sem();
+ return 0;
+ }
console_locked = 1;
console_may_schedule = 0;
return 1;
@@ -3354,6 +3404,16 @@ void console_unlock(void)
{
struct console_flush_type ft;
+ if (console_suspended) {
+ printk_store(KERN_INFO "printk: %s\n", __func__);
+ /*
+ * Simply unlock directly.
+ * Here console_locked=1 and console_may_schedule=1.
+ */
+ up_console_sem();
+ return;
+ }
+
printk_get_console_flush_type(&ft);
if (ft.legacy_direct)
__console_flush_and_unlock();
--
2.52.0
Powered by blists - more mailing lists