lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z4WFjBVHpndct7br@desktop0a>
Date: Mon, 13 Jan 2025 22:28:44 +0100
From: Roberto Ricci <io@...icci.it>
To: ebiederm@...ssion.com, rafael@...nel.org, pavel@....cz,
	ytcoode@...il.com
Cc: kexec@...ts.infradead.org, linux-pm@...r.kernel.org,
	akpm@...ux-foundation.org, regressions@...ts.linux.dev,
	linux-kernel@...r.kernel.org
Subject: [REGRESSION] Kernel booted via kexec fails to resume from hibernation

After rebooting the system via kexec, hibernating and rebooting the machine, this oops occurs:

```
[   88.485216] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000940: 0000 [#1] PREEMPT SMP KASAN PTI
[   88.485233] KASAN: probably user-memory-access in range [0x0000000000004a00-0x0000000000004a07]
[   88.485240] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Kdump: loaded Not tainted 6.13.0-rc7_ricci #1
[   88.485245] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[   88.485252] RIP: 0010:next_zone (mm/mmzone.c:20 mm/mmzone.c:37)
[ 88.485270] Code: 73 10 48 05 c0 06 00 00 48 83 c4 08 5b c3 cc cc cc cc 48 8d bb 00 4a 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 9d 00 00 00 8b 8b 00 4a 00 00
All code
========
   0:	73 10                	jae    0x12
   2:	48 05 c0 06 00 00    	add    $0x6c0,%rax
   8:	48 83 c4 08          	add    $0x8,%rsp
   c:	5b                   	pop    %rbx
   d:	c3                   	ret
   e:	cc                   	int3
   f:	cc                   	int3
  10:	cc                   	int3
  11:	cc                   	int3
  12:	48 8d bb 00 4a 00 00 	lea    0x4a00(%rbx),%rdi
  19:	48 b8 00 00 00 00 00 	movabs $0xdffffc0000000000,%rax
  20:	fc ff df 
  23:	48 89 fa             	mov    %rdi,%rdx
  26:	48 c1 ea 03          	shr    $0x3,%rdx
  2a:*	0f b6 04 02          	movzbl (%rdx,%rax,1),%eax		<-- trapping instruction
  2e:	84 c0                	test   %al,%al
  30:	74 08                	je     0x3a
  32:	3c 03                	cmp    $0x3,%al
  34:	0f 8e 9d 00 00 00    	jle    0xd7
  3a:	8b 8b 00 4a 00 00    	mov    0x4a00(%rbx),%ecx

Code starting with the faulting instruction
===========================================
   0:	0f b6 04 02          	movzbl (%rdx,%rax,1),%eax
   4:	84 c0                	test   %al,%al
   6:	74 08                	je     0x10
   8:	3c 03                	cmp    $0x3,%al
   a:	0f 8e 9d 00 00 00    	jle    0xad
  10:	8b 8b 00 4a 00 00    	mov    0x4a00(%rbx),%ecx
[   88.485275] RSP: 0018:ffffffffa4807ce8 EFLAGS: 00010002
[   88.485279] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 1ffff11027fff565
[   88.485281] RDX: 0000000000000940 RSI: ffffffffa3a89b80 RDI: 0000000000004a00
[   88.485283] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffed10234c82c8
[   88.485285] R10: ffff88811a641647 R11: ffff88811a635e30 R12: 0000000000000000
[   88.485287] R13: 1ffffffff4839048 R14: 0000000000000000 R15: 000000000000003d
[   88.485290] FS:  0000000000000000(0000) GS:ffff88811a600000(0000) knlGS:0000000000000000
[   88.485292] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   88.485294] CR2: 000055e8c586c300 CR3: 0000000106eb0000 CR4: 00000000000006f0
[   88.485299] Call Trace:
[   88.485301]  <TASK>
[   88.485306] ? die_addr (arch/x86/kernel/dumpstack.c:421 arch/x86/kernel/dumpstack.c:460)
[   88.485313] ? exc_general_protection (arch/x86/kernel/traps.c:751 arch/x86/kernel/traps.c:693)
[   88.485319] ? asm_exc_general_protection (./arch/x86/include/asm/idtentry.h:617)
[   88.485324] ? next_zone (mm/mmzone.c:20 mm/mmzone.c:37)
[   88.485336] ? calc_load_nohz_start (kernel/sched/loadavg.c:251 (discriminator 2))
[   88.485341] need_update (mm/vmstat.c:2032 (discriminator 2))
[   88.485366] quiet_vmstat (mm/vmstat.c:2065 (discriminator 2))
[   88.485369] tick_nohz_stop_tick (./include/linux/hrtimer.h:135 kernel/time/tick-sched.c:1044)
[   88.485373] ? __pfx_tick_nohz_stop_tick (kernel/time/tick-sched.c:970)
[   88.485376] ? tick_nohz_next_event (kernel/time/tick-sched.c:952 (discriminator 2))
[   88.485379] ? __pfx_tsc_verify_tsc_adjust (arch/x86/kernel/tsc_sync.c:51)
[   88.485396] tick_nohz_idle_stop_tick (kernel/time/tick-sched.c:1229)
[   88.485399] do_idle (kernel/sched/idle.c:185 kernel/sched/idle.c:325)
[   88.485403] ? __pfx_do_idle (kernel/sched/idle.c:253)
[   88.485406] cpu_startup_entry (kernel/sched/idle.c:422)
[   88.485409] rest_init (init/main.c:720)
[   88.485413] ? acpi_subsystem_init (drivers/acpi/bus.c:1314)
[   88.485417] start_kernel (init/main.c:1000)
[   88.485422] x86_64_start_reservations (arch/x86/kernel/head64.c:495)
[   88.485426] x86_64_start_kernel (??:?)
[   88.485432] common_startup_64 (arch/x86/kernel/head_64.S:415)
[   88.485437]  </TASK>
[   88.485439] Modules linked in: cfg80211 8021q garp stp mrp llc ppdev evdev input_leds intel_agp e1000 mac_hid intel_gtt pcspkr i2c_piix4 agpgart i2c_smbus parport_pc parport tiny_power_button button rfkill vhost_vsock vmw_vsock_virtio_transport_common vsock vhost_net vhost vhost_iotlb tap vfio_iommu_type1 vfio iommufd uhid hid dm_mod uinput userio ppp_generic slhc tun loop cuse fuse ext4 crc32c_generic crc16 mbcache jbd2 bochs drm_client_lib drm_shmem_helper sd_mod drm_kms_helper ata_generic pata_acpi ata_piix libata drm scsi_mod serio_raw scsi_common qemu_fw_cfg
```

I can reproduce this with kernel 6.13-rc7 in a qemu x86_64 virtual machine
running Void Linux, with the following commands:

```
# kexec -l /boot/vmlinuz-6.13.0-rc7 --initrd=/boot/initramfs-6.13.0-rc7 --reuse-cmdline
# reboot
# printf reboot >/sys/power/disk
# printf disk >/sys/power/state
```

If kexec is not used, hibernation works fine.

This started happening since the 6.8 series; 6.7 works fine.
I performed a bisection and it pointed to
18d565ea95fe ("kexec_file: fix incorrect temp_start value in locate_mem_hole_top_down()").

#regzbot introduced: 18d565ea95fe553f442c5bbc5050415bab3c3fa4

I will send the kernel config and dmesg in replies to this email.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ