linux-kernel - Re: Regression: system freeze on resume from suspend introduced by printk per-console suspended state

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aXNnVgIeinG1VD00@pathway.suse.cz>
Date: Fri, 23 Jan 2026 13:19:34 +0100
From: Petr Mladek <pmladek@...e.com>
To: ysard <ysard_git@....fr>
Cc: John Ogness <john.ogness@...utronix.de>, linux-kernel@...r.kernel.org,
	senozhatsky@...omium.org
Subject: Re: Regression: system freeze on resume from suspend introduced by
 printk per-console suspended state

On Fri 2026-01-23 08:44:39, ysard wrote:
> Good evening, thank you for your reply and the patch.
> 
> 
> Summary
> ======
> 
> The patch does not seem to have any effect on the problem, *but* I have found a
> way to temporarily fix the freeze by disabling the `nvidia-suspend` service.

Great catch!

> Additional info for diagnostics
> ===============================
> 
> $ cat /proc/driver/nvidia/version
> NVRM version: NVIDIA UNIX x86_64 Kernel Module  470.256.02  Thu May  2 14:37:44 UTC 2024
> GCC version:  gcc version 12.5.0 (Debian 12.5.0-6)
> 
> $ nvcc --version
> nvcc: NVIDIA (R) Cuda compiler driver
> Copyright (c) 2005-2019 NVIDIA Corporation
> Built on Wed_Oct_23_19:24:38_PDT_2019
> Cuda compilation tools, release 10.2, V10.2.89
> 
> 
> Procedure requested
> ===================
> 
> > I have attached a patch (based on 6.19-rc4). It should restore the old
> > console_lock behavior during suspend/resume. Assuming this works for
> > you, it also adds some debugging information so that we can figure out
> > who is locking the console.
> 
> I applied the patch. The behavior is the same as before (no resume).
> 
> $ uname -r
> 6.19.0-rc4-dirty
> 
> $ dmesg | grep printk
> [    0.030102] [      T0] printk: log buffer data + meta data: 131072 + 458752 = 589824 bytes
> [    0.077779] [      T0] printk: legacy console [tty0] enabled
> [  152.678589] [   T1349] printk: Suspending console(s) (use no_console_suspend to debug)
> ...
> no resume
> 
> 
> Temporary solution
> ==================
> 
> I had the idea of restarting in recovery mode (rescue.target) to run the test.
> The `systemctl suspend` command is not available in this mode, which forced me
> to use the `pm-suspend` command, which allows for proper sleep and resume across
> all kernel versions that I have been able to test previously.
> 
> Systemd triggers a number of services before actually going into sleep mode,
> including a call to nvidia-suspend.service, which I disabled
> ("because it's always nvidia").
> 
> The following command restores normal operation of `systemctl suspend`,
> including on the first non-functional commit found by the bisect
> (9e70a5e109a4a23367810de09be826c52d27ee2f).
> 
> $ systemctl disable nvidia-suspend.service
> 
> This service calls a script `/usr/bin/nvidia-sleep.sh` that seems to play with
> vt consoles and expects that they are still usable (`chvt 63` ?):

I did run chvt with strace and it does something like:

openat(AT_FDCWD, "/dev/tty0", O_RDWR)   = 3
ioctl(3, TCGETS2, {c_iflag=IGNBRK|IGNPAR, c_oflag=NL0|CR0|TAB0|BS0|VT0|FF0|, c_cflag=B38400|CS8|CREAD, c_lflag=, ...}) = 0
ioctl(3, KDGKBTYPE, [KB_101])           = 0
rt_sigaction(SIGALRM, {sa_handler=0x556bcf0418e0, sa_mask=[], sa_flags=SA_RESTORER|SA_SIGINFO, sa_restorer=0x7f3e60a42910}, NULL, 8) = 0
timer_create(CLOCK_REALTIME, {sigev_value={sival_int=1668762552, sival_ptr=0x7ffc63774bb8}, sigev_signo=SIGALRM, sigev_notify=SIGEV_SIGNAL}, [0]) = 0
timer_settime(0, 0, {it_interval={tv_sec=1, tv_nsec=0}, it_value={tv_sec=1, tv_nsec=0}}, NULL) = 0
ioctl(3, VT_ACTIVATE, 0x3f)             = 0
ioctl(3, VT_WAITACTIVE, 0x3f)           = 0

And vt_ioctl(,,VT_ACTIVATE) calls vc_allocate(arg) under
console_lock()...

The commit 9e70a5e109a4a233 ("printk: Add per-console suspended
state") does some changes in console_lock(). It newly sets:

	console_locked = 1;
	console_may_schedule = 1;

I can't see how this might cause the freeze.

Well, for example, it would allow console_conditional_schedule()
to get asleep. And it might be harder to obtain the lock when
the current owner is sleeping.

But it would have an effect only in CONFIG_PREEMPT_VOLUNTARY kernel.
The process might get scheduled even without cond_resched() with
other CONFIG_PREEMPT modes.

Also I would expect that the userspace waits until the services
finish the job before suspending the kernel.

>     #!/bin/bash
> 
>     if [ ! -f /proc/driver/nvidia/suspend ]; then
>         exit 0
>     fi
> 
>     RUN_DIR="/var/run/nvidia-sleep"
>     XORG_VT_FILE="${RUN_DIR}"/Xorg.vt_number
> 
>     PATH="/bin:/usr/bin"
> 
>     case "$1" in
>         suspend|hibernate)
>             mkdir -p "${RUN_DIR}"
>             fgconsole > "${XORG_VT_FILE}"
>             chvt 63
>             if [[ $? -ne 0 ]]; then
>                 exit $?
>             fi
>             echo "$1" > /proc/driver/nvidia/suspend
>             exit $?
>             ;;
>         resume)
>             echo "$1" > /proc/driver/nvidia/suspend
>             #
>             # Check if Xorg was determined to be running at the time
>             # of suspend, and whether its VT was recorded.  If so,
>             # attempt to switch back to this VT.
>             #
>             if [[ -f "${XORG_VT_FILE}" ]]; then
>                 XORG_PID=$(cat "${XORG_VT_FILE}")
>                 rm "${XORG_VT_FILE}"
>                 chvt "${XORG_PID}"
>             fi
>             exit 0

I just wonder. Could you please try to comment out the various
commands here and bisect whether the problem is with
"fgconsole", "chvt", or "echo XXX >/proc/driver/nvidia/suspend"
commands.

I mean to try to disable the counter parts in the suspend/resume
code paths and try whether the freeze is still reproducible?

>             ;;
>         *)
>             exit 1
>     esac
> 
> 
> Conclusion
> ==========
> 
> kernel              nvidia-suspend (systemd 259~rc1-1)  result
> <  9e70a5e109a4     enabled                             ok
> <  9e70a5e109a4     disabled                            ok
> >= 9e70a5e109a4     enabled                             freeze
> >= 9e70a5e109a4     disabled                            ok
> 
> - Reactivating this service causes the freeze to reappear in a reproducible pattern.
> - The `pm-suspend` command has never stopped working.
> 
> It seems that this is a two-sided problem?
> If the kernel is not the issue, I apologize and am sorry for wasting your time;
> I should have thought about the layers added by systemd sooner.

There is no need to apologize. It seems to be somehow affected
by the kernel commit. And it would be great to understand what
is going on.

> Extra
> =====
> 
> During my tests with 6.19.0-rc1 and 6.19.0-rc4, I noticed that resuming a sleep
> test that used to work now fails (it worked in 6.18.2), but I think this is
> unrelated and is due to another issue. I am noting this for historical purposes.
> 
> $ echo core > /sys/power/pm_test
> $ echo deep > /sys/power/mem_sleep
> 
> Both commands `pm-suspend` or `systemctl suspend` have the same effect:
> 
> - Trigger suspend (`kernel: PM: suspend entry (deep)` in dmesg);
> - No response when pressing the power button to wake up;
> - Force shutdown by holding down the power button;
> - The computer shuts down but the motherboard indicates a state similar to
>  sleep mode (LED flashing);
> - Pressing the power button starts the computer (fans + HDD spin up) for a
>  fraction of a second (<1s) then the machine shuts down;
> - Pressing the power button starts the machine normally
>  (not a resume from sleep mode).

This would deserve a separate thread. It would be great if you could
bisect it to the problematic commit.

Best Regards,
Petr