lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aXoWiJhcOaGGlcmk@pathway.suse.cz>
Date: Wed, 28 Jan 2026 15:00:40 +0100
From: Petr Mladek <pmladek@...e.com>
To: ysard <ysard_git@....fr>
Cc: John Ogness <john.ogness@...utronix.de>, linux-kernel@...r.kernel.org,
	senozhatsky@...omium.org
Subject: Re: Regression: system freeze on resume from suspend introduced by
 printk per-console suspended state

On Sat 2026-01-24 02:22:41, ysard wrote:
> On Fri 2026-01-23 13:19:34 +0100, Petr Mladek wrote:
> > Also I would expect that the userspace waits until the services
> > finish the job before suspending the kernel.
> 
> It does:
> 
>     janv. 24 00:33:41 systemd[1]: Reached target sleep.target - Sleep.
>     janv. 24 00:33:41 systemd[1]: Starting nvidia-suspend.service - NVIDIA system suspend actions...
>     janv. 24 00:33:41 suspend[51525]: nvidia-suspend.service
>     janv. 24 00:33:41 logger[51525]: <13>Jan 24 00:33:41 suspend: nvidia-suspend.service
>     janv. 24 00:33:42 kernel: audit: type=1400 audit(1769211222.373:2351): apparmor="ALLOWED" operation="open" class="file" profile="Xorg" name="/dev/nvidiactl" pid=1441 comm="Xorg" requested_mask="wr" denied_mask="wr" fsuid=0 ouid=0
>     janv. 24 00:33:42 kernel: audit: type=1400 audit(1769211222.969:2352): apparmor="ALLOWED" operation="open" class="file" profile="Xorg" name="/dev/nvidiactl" pid=1441 comm="Xorg" requested_mask="wr" denied_mask="wr" fsuid=0 ouid=0
>     janv. 24 00:33:45 systemd[1]: nvidia-suspend.service: Deactivated successfully.
>     janv. 24 00:33:45 systemd[1]: Finished nvidia-suspend.service - NVIDIA system suspend actions.
>     janv. 24 00:33:45 systemd[1]: Starting systemd-suspend.service - System Suspend...
>     janv. 24 00:33:45 systemd[1]: session-1.scope: Unit now frozen-by-parent.
>     janv. 24 00:33:45 systemd[1]: user@...0.service: Unit now frozen-by-parent.
>     janv. 24 00:33:45 systemd[1]: user-1000.slice: Unit now frozen-by-parent.
>     janv. 24 00:33:45 systemd[1]: user.slice: Unit now frozen.
>     janv. 24 00:33:45 systemd-sleep[51562]: Successfully froze unit 'user.slice'.
>     janv. 24 00:33:45 systemd-sleep[51562]: Performing sleep operation 'suspend'...
>     janv. 24 00:33:45 kernel: PM: suspend entry (deep)

OK.

> Yes I have a reproducible pattern here. With the service disabled.
> The service `nvidia-resume.service` (which basically calls the script
> with the 'resume' argument) is expected to start if the resume is 
> completed, but the system does not reach this stage during the freeze.
> 
> No freeze:
>     $ sudo sh -c "
>     mkdir -p /var/run/nvidia-sleep \
>     && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \
>     && chvt 63 \
>     && systemctl suspend"
> 
> Freeze:
>     $ sudo sh -c "
>     mkdir -p /var/run/nvidia-sleep \
>     && echo 2 > /var/run/nvidia-sleep/Xorg.vt_number \
>     && chvt 63 \
>     && echo suspend >/proc/driver/nvidia/suspend \
>     && systemctl suspend"
> 
> So the problem is related to this command:
>     $ echo suspend >/proc/driver/nvidia/suspend
> 
> Note that without the systemctl order this command suspends and wakes up the gpu correctly:
>     $ sudo sh -c "
>     chvt 63 \
>     && echo suspend >/proc/driver/nvidia/suspend; \
>     sleep 4; \
>     echo resume >/proc/driver/nvidia/suspend; \
>     chvt 2"

Interesting. It looks like the nvidia suspend does something which
breaks the system suspend. But the driver is able to revert it...

To be honest, I do not have any theory which could explain this.

But I have found a bug in John's debug patch from
https://lore.kernel.org/all/877bts1ltv.fsf@jogness.linutronix.de/

The patch tried to restore the original behavior on current mainline.
But console_suspend()/cosnole_resume() function have been renamed recently
to console_suspend_all()/console_resume_all(). The original
names were used for console-specific suspend/resume variants,
see
https://lore.kernel.org/all/20250226-printk-renaming-v1-0-0b878577f2e6@suse.com/

Also the debug patch did not revert synchronize_srcu(). I guess that
this was intentional. But I would rather revert it as well because
it is a potentially blocking operation.

Could you please test it with this fixed version of the debug patch?

If the patch helps, by chance, then please try to uncomment
the synchronize_srcu() calls and check if it still works.
I wonder if they make in difference.

>From a36b57cbcb239e7e5af4fb8278690cd4965d6fc0 Mon Sep 17 00:00:00 2001
From: John Ogness <john.ogness@...utronix.de>
Date: Thu, 8 Jan 2026 10:49:24 +0106
Subject: [DEBUG v2] printk: Debug new vs. old suspend/resume behavior

This is just for debugging. It should restore the old console_lock
behavior for suspend/resume and also adds some debugging information.
Please compile with CONFIG_PRINTK_CALLER=y so that we can see which
tasks are locking/unlocking the console during suspend/resume.

Changes against v1:

- Set/Clear the global "console_suspended" variable in
  console_suspend_all()/console_restore_all() instead of
  console_suspend()/console_resume().

  The functions have been renamed recently, see
  https://lore.kernel.org/all/20250226-printk-renaming-v1-0-0b878577f2e6@suse.com/

- Do not call synchronize_srcu() in the suspend/resume functions.
  They are another potentially blocking operation added by
  the problematic commit 9e70a5e109a4a2336 ("printk: Add per-console
  suspended state").

Signed-off-by: John Ogness <john.ogness@...utronix.de>
Signed-off-by: Petr Mladek <pmladek@...e.com>
---
 kernel/printk/printk.c | 64 ++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 62 insertions(+), 2 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 1d765ad242b8..23fddc4006d3 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -356,6 +356,22 @@ static void __up_console_sem(unsigned long ip)
  */
 static int console_locked;
 
+static int console_suspended;
+
+int vprintk_store(int facility, int level,
+		  const struct dev_printk_info *dev_info,
+		  const char *fmt, va_list args);
+
+/* Helper function to store-only. */
+static void printk_store(const char *fmt, ...)
+{
+	va_list args;
+
+	va_start(args, fmt);
+	vprintk_store(0, LOGLEVEL_DEFAULT, NULL, fmt, args);
+	va_end(args);
+}
+
 /*
  *	Array of consoles built from command line options (console=)
  */
@@ -2748,6 +2764,12 @@ void console_suspend_all(void)
 	if (!console_suspend_enabled)
 		return;
 
+	console_lock();
+	console_suspended = 1;
+	printk_store(KERN_INFO "printk: %s\n", __func__);
+	/* Unlock directly (i.e. without clearing @console_locked). */
+	up_console_sem();
+
 	console_list_lock();
 	for_each_console(con)
 		console_srcu_write_flags(con, con->flags | CON_SUSPENDED);
@@ -2759,7 +2781,7 @@ void console_suspend_all(void)
 	 * is guaranteed that all printing has stopped when this function
 	 * completes.
 	 */
-	synchronize_srcu(&console_srcu);
+//	synchronize_srcu(&console_srcu);
 }
 
 void console_resume_all(void)
@@ -2785,7 +2807,17 @@ void console_resume_all(void)
 		 * contexts must be able to see they are no longer suspended so
 		 * that they are guaranteed to wake up and resume printing.
 		 */
-		synchronize_srcu(&console_srcu);
+//		synchronize_srcu(&console_srcu);
+
+		down_console_sem();
+		printk_store(KERN_INFO "printk: %s\n", __func__);
+		console_suspended = 0;
+		/*
+		 * Perform a regular unlock.
+		 * Here console_locked=1 and console_may_schedule=1.
+		 * @console_unlocked will be cleared.
+		 */
+		console_unlock();
 	}
 
 	printk_get_console_flush_type(&ft);
@@ -2841,6 +2873,15 @@ void console_lock(void)
 		msleep(1000);
 
 	down_console_sem();
+	if (console_suspended) {
+		printk_store(KERN_INFO "printk: %s\n", __func__);
+		/*
+		 * Keep console locked, but do not touch
+		 * @console_locked or @console_may_schedule.
+		 * (Although they will both be 1 here anyway.)
+		 */
+		return;
+	}
 	console_locked = 1;
 	console_may_schedule = 1;
 }
@@ -2861,6 +2902,15 @@ int console_trylock(void)
 		return 0;
 	if (down_trylock_console_sem())
 		return 0;
+	if (console_suspended) {
+		printk_store(KERN_INFO "printk: %s\n", __func__);
+		/*
+		 * The lock was acquired, but unlock directly and report
+		 * failure. Here console_locked=1 and console_may_schedule=1.
+		 */
+		up_console_sem();
+		return 0;
+	}
 	console_locked = 1;
 	console_may_schedule = 0;
 	return 1;
@@ -3354,6 +3404,16 @@ void console_unlock(void)
 {
 	struct console_flush_type ft;
 
+	if (console_suspended) {
+		printk_store(KERN_INFO "printk: %s\n", __func__);
+		/*
+		 * Simply unlock directly.
+		 * Here console_locked=1 and console_may_schedule=1.
+		 */
+		up_console_sem();
+		return;
+	}
+
 	printk_get_console_flush_type(&ft);
 	if (ft.legacy_direct)
 		__console_flush_and_unlock();
-- 
2.52.0



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ