lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z-hYWc9LtBU1Yhtg@desktop0a>
Date: Sat, 29 Mar 2025 21:30:17 +0100
From: Roberto Ricci <io@...icci.it>
To: Baoquan He <bhe@...hat.com>
Cc: Dave Young <dyoung@...hat.com>, ebiederm@...ssion.com,
	rafael@...nel.org, pavel@....cz, ytcoode@...il.com,
	kexec@...ts.infradead.org, linux-pm@...r.kernel.org,
	akpm@...ux-foundation.org, regressions@...ts.linux.dev,
	linux-kernel@...r.kernel.org
Subject: Re: [REGRESSION] Kernel booted via kexec fails to resume from
 hibernation

On 2025-03-29 09:44 +0800, Baoquan He wrote:
> On 03/29/25 at 01:14am, Roberto Ricci wrote:
> [snip]
> > Anyway, I performed yet another bisection, this time with just plain
> > defconfig plus CONFIG_KEXEC_FILE=y, and I got different results.
> > 
> > Updated steps to reproduce:
> > 1. Boot kernel >= v6.8 in a virtual machine created with this command:
> >    `qemu-system-x86_64 -enable-kvm -smp 1 -m 4.0G -hda disk.qcow2`
> > 2. Load the same kernel with:
> >    `kexec --kexec-file-syscall -l /boot/vmlinuz-6.14.0 --initrd /boot/initramfs-6.14.0.img --reuse-cmdline`
> > 3. Reboot (or call `kexec -e` directly)
> > 4. Hibernate and reboot: `printf reboot >/sys/power/disk && printf disk >/sys/power/state`
> > 5. Upon resuming, three things could happen, depending on luck:
> 
> OK, this is a little complicated. wondering why you need to do the
> hibernation and reboot. Just for curiosity.

The reason I do hibernation and reboot instead of hibernation and then
manually boot again is just convenience during tests. The issue occurs
with manual reboot too.
The reason I want kexec + hibernation to work is to fix a hibernation
issue on a system using ZFSBootMenu, a bootloader based on Linux which
uses kexec to boot the final OS. Other software using the same
mechanism include Petitboot and LinuxBoot. They might be affected as
well but I didn't try.

> > 5a. A kernel oops:
> > ```
> > [   42.574201] BUG: kernel NULL pointer dereference, address: 0000000000000000
> ...snip... 
> > I will send config and dmesg in replies to this email.
> > 
> > The bisection pointed to
> > b3ba234171cd kexec_file: load kernel at top of system RAM if required
> [snip]
> 
> I doubt how this caused the failure. I have several questions, could you
> help answer:
> 
> 1) Can this problem be stably reproduced with kexec_file_load?

Every kernel build I tested which contains that commit is affected.
However a given build will not always lead to the same of the three
possible outcomes I described. E.g. first you get a oops (case 5a),
then you repeat the same steps with the same kernel image and the
system may get stuck at a black screen instead (case 5b).
But it never fully works.

> 2) if answer to 1) is yes, can reverting b3ba234171cd fix it stably?

Yes. None of cases 5{a,b,c} I previously described occur. Seems to work
fine.

> 3) If answer to 1) and 2) is yes, does kexec_load works for you? Asking
> this because kexec_load interface defaults to put kexec kernel on top of
> system RAM which is equivalent to applying commit b3ba234171cd.

No, it doesn't. While hibernation alone works, kexec + hibernation
results in the system just rebooting without resuming the hibernation
image, but no crash or other weird behaviour occurs.
Initially I decided to focus on kexec_file_load in order to narrow
things down, but that was before noticing that the bug could manifest
itself in different forms.
It is possible, indeed, that both syscalls are affected by the same
problem, which is not caused by commit b3ba234171cd.
I tried to test kexec_load with some older kernels, but I got build
errors, so I tested longterm releases where such errors have been fixed.
With v4.9.337, kexec (via kexec_load) + hibernation works.
With v5.4.291 it doesn't.
I'm not sure how bisection could be done in this case.

> 4) Can you add '-d' to 'kexec -l' to print more debugging message?

When using kexec_file_load, just these two lines get printed:

```
Try gzip decompression.
Try LZMA decompression.
```

When using kexec_load on kernel v5.4.291 (which doesn't work):
[the output is in a reply to this email]

When using kexec_load on kernel v4.9.337 (which works):
Identical to above, except for the exact hex value of some addresses.

> 5) Can normal kexec trigger the failure? I mean operating kexec w/o
> the hibernation/resumption. 

No, kexec without hibernation seems to work fine, regardless of kernel
version and kexec syscall used.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ