lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241025231320.45417-1-kuniyu@amazon.com>
Date: Fri, 25 Oct 2024 16:13:20 -0700
From: Kuniyuki Iwashima <kuniyu@...zon.com>
To: <x86@...nel.org>, <linux-edac@...r.kernel.org>,
	<linux-kernel@...r.kernel.org>
CC: Tony Luck <tony.luck@...el.com>, Borislav Petkov <bp@...en8.de>, "Thomas
 Gleixner" <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, Dave Hansen
	<dave.hansen@...ux.intel.com>, "H. Peter Anvin" <hpa@...or.com>, "Benjamin
 Herrenschmidt" <benh@...zon.com>, Kuniyuki Iwashima <kuniyu@...zon.com>
Subject: WARNING in lmce_supported() during reboot.

Hello x86 maintainers,

We have seen the splat below few times when just rebooting hosts.

It rarely happens and seems a timing related, so we don't have a
reproducer.

Our kernel source in the splat is here,
https://github.com/amazonlinux/linux/tree/kernel-6.1.61-85.141.amzn2023

and the triggered WARN_ON_ONCE() in lmce_supported() is here.
https://github.com/amazonlinux/linux/blob/kernel-6.1.61-85.141.amzn2023/arch/x86/kernel/cpu/mce/intel.c#L124

Do you have any hint ?

Thanks in advance.


ACPI: PM: Preparing to enter system sleep state S5
reboot: Restarting system
reboot: machine restart
------------[ cut here ]------------
WARNING: CPU: 1 PID: 0 at arch/x86/kernel/cpu/mce/intel.c:124 lmce_supported (arch/x86/kernel/cpu/mce/intel.c:124 arch/x86/kernel/cpu/mce/intel.c:99) 
Modules linked in: ib_core binfmt_misc ext4 crc16 mbcache jbd2 sunrpc mousedev atkbd psmouse ghash_clmulni_intel vivaldi_fmap libps2 aesni_intel crypto_simd cryptd i8042 serio ena button sch_fq_codel dm_mod fuse configfs dax loop dmi_sysfs simpledrm drm_shmem_helper drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea drm i2c_core drm_panel_orientation_quirks backlight fb crc32_pclmul crc32c_intel fbdev efivarfs
Hardware name: Amazon EC2 c6i.4xlarge/, BIOS 1.0 10/16/2017
RIP: 0010:lmce_supported (arch/x86/kernel/cpu/mce/intel.c:124 arch/x86/kernel/cpu/mce/intel.c:99) 
Code: 81 fb 00 00 00 09 75 da b9 3a 00 00 00 0f 32 48 c1 e2 20 48 09 c2 48 89 d3 66 90 48 89 d8 48 c1 e8 14 83 e0 01 83 e3 01 75 ba <0f> 0b 31 c0 eb b4 31 d2 48 89 de bf 3a 00 00 00 e8 6b e6 57 00 eb
All code
========
   0:	81 fb 00 00 00 09    	cmp    $0x9000000,%ebx
   6:	75 da                	jne    0xffffffffffffffe2
   8:	b9 3a 00 00 00       	mov    $0x3a,%ecx
   d:	0f 32                	rdmsr
   f:	48 c1 e2 20          	shl    $0x20,%rdx
  13:	48 09 c2             	or     %rax,%rdx
  16:	48 89 d3             	mov    %rdx,%rbx
  19:	66 90                	xchg   %ax,%ax
  1b:	48 89 d8             	mov    %rbx,%rax
  1e:	48 c1 e8 14          	shr    $0x14,%rax
  22:	83 e0 01             	and    $0x1,%eax
  25:	83 e3 01             	and    $0x1,%ebx
  28:	75 ba                	jne    0xffffffffffffffe4
  2a:*	0f 0b                	ud2		<-- trapping instruction
  2c:	31 c0                	xor    %eax,%eax
  2e:	eb b4                	jmp    0xffffffffffffffe4
  30:	31 d2                	xor    %edx,%edx
  32:	48 89 de             	mov    %rbx,%rsi
  35:	bf 3a 00 00 00       	mov    $0x3a,%edi
  3a:	e8 6b e6 57 00       	call   0x57e6aa
  3f:	eb                   	.byte 0xeb

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2
   2:	31 c0                	xor    %eax,%eax
   4:	eb b4                	jmp    0xffffffffffffffba
   6:	31 d2                	xor    %edx,%edx
   8:	48 89 de             	mov    %rbx,%rsi
   b:	bf 3a 00 00 00       	mov    $0x3a,%edi
  10:	e8 6b e6 57 00       	call   0x57e680
  15:	eb                   	.byte 0xeb
RSP: 0018:ffffa18f00154fb8 EFLAGS: 00010046
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000000003a
RDX: 0000000000000000 RSI: 00000000000000ff RDI: ffff965cfe2599c0
RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: ffffa18f00154ff8 R12: 0000000000000001
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff965cfe240000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8485dfba30 CR3: 0000000389a10003 CR4: 00000000007706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<IRQ>
? show_trace_log_lvl (arch/x86/kernel/dumpstack.c:259) 
? show_trace_log_lvl (arch/x86/kernel/dumpstack.c:259) 
? mce_intel_feature_clear (arch/x86/kernel/cpu/mce/intel.c:465 arch/x86/kernel/cpu/mce/intel.c:502) 
? lmce_supported (arch/x86/kernel/cpu/mce/intel.c:124 arch/x86/kernel/cpu/mce/intel.c:99) 
? __warn (kernel/panic.c:672) 
? lmce_supported (arch/x86/kernel/cpu/mce/intel.c:124 arch/x86/kernel/cpu/mce/intel.c:99) 
? report_bug (lib/bug.c:201 lib/bug.c:219) 
? handle_bug (arch/x86/kernel/traps.c:324) 
? exc_invalid_op (arch/x86/kernel/traps.c:345 (discriminator 1)) 
? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:568) 
? lmce_supported (arch/x86/kernel/cpu/mce/intel.c:124 arch/x86/kernel/cpu/mce/intel.c:99) 
? clear_local_APIC (./arch/x86/include/asm/apic.h:393 arch/x86/kernel/apic/apic.c:1192) 
mce_intel_feature_clear (arch/x86/kernel/cpu/mce/intel.c:465 arch/x86/kernel/cpu/mce/intel.c:502) 
stop_this_cpu (arch/x86/kernel/process.c:780) 
__sysvec_reboot (arch/x86/kernel/smp.c:140) 
sysvec_reboot (arch/x86/kernel/smp.c:136 (discriminator 14)) 
</IRQ>
<TASK>
asm_sysvec_reboot (./arch/x86/include/asm/idtentry.h:656) 
RIP: 0010:acpi_idle_do_entry (./arch/x86/include/asm/irqflags.h:40 ./arch/x86/include/asm/irqflags.h:75 drivers/acpi/processor_idle.c:113 drivers/acpi/processor_idle.c:572) 
Code: 75 08 48 8b 15 b1 81 df 02 ed c3 cc cc cc cc 65 48 8b 04 25 00 ff 01 00 48 8b 00 a8 08 75 eb 66 90 0f 00 2d 58 c8 6a 00 fb f4 <fa> c3 cc cc cc cc e9 01 fc ff ff 90 0f 1f 44 00 00 41 56 41 55 41
All code
========
   0:	75 08                	jne    0xa
   2:	48 8b 15 b1 81 df 02 	mov    0x2df81b1(%rip),%rdx        # 0x2df81ba
   9:	ed                   	in     (%dx),%eax
   a:	c3                   	ret
   b:	cc                   	int3
   c:	cc                   	int3
   d:	cc                   	int3
   e:	cc                   	int3
   f:	65 48 8b 04 25 00 ff 	mov    %gs:0x1ff00,%rax
  16:	01 00 
  18:	48 8b 00             	mov    (%rax),%rax
  1b:	a8 08                	test   $0x8,%al
  1d:	75 eb                	jne    0xa
  1f:	66 90                	xchg   %ax,%ax
  21:	0f 00 2d 58 c8 6a 00 	verw   0x6ac858(%rip)        # 0x6ac880
  28:	fb                   	sti
  29:	f4                   	hlt
  2a:*	fa                   	cli		<-- trapping instruction
  2b:	c3                   	ret
  2c:	cc                   	int3
  2d:	cc                   	int3
  2e:	cc                   	int3
  2f:	cc                   	int3
  30:	e9 01 fc ff ff       	jmp    0xfffffffffffffc36
  35:	90                   	nop
  36:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  3b:	41 56                	push   %r14
  3d:	41 55                	push   %r13
  3f:	41                   	rex.B

Code starting with the faulting instruction
===========================================
   0:	fa                   	cli
   1:	c3                   	ret
   2:	cc                   	int3
   3:	cc                   	int3
   4:	cc                   	int3
   5:	cc                   	int3
   6:	e9 01 fc ff ff       	jmp    0xfffffffffffffc0c
   b:	90                   	nop
   c:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  11:	41 56                	push   %r14
  13:	41 55                	push   %r13
  15:	41                   	rex.B
RSP: 0018:ffffa18f000afe70 EFLAGS: 00000246
RAX: 0000000000004000 RBX: ffff965603d92400 RCX: 4000000000000000
RDX: ffff965cfe240000 RSI: ffff965601478800 RDI: ffff965601478864
RBP: 0000000000000001 R08: ffffffffb62182c0 R09: 0000000000000000
R10: 0000000000002703 R11: 000000000001993d R12: 0000000000000001
R13: ffffffffb6218340 R14: 0000000000000001 R15: 0000000000000000
acpi_idle_enter (drivers/acpi/processor_idle.c:711 (discriminator 3)) 
cpuidle_enter_state (drivers/cpuidle/cpuidle.c:239) 
cpuidle_enter (drivers/cpuidle/cpuidle.c:358) 
cpuidle_idle_call (kernel/sched/idle.c:240) 
do_idle (kernel/sched/idle.c:305) 
cpu_startup_entry (kernel/sched/idle.c:400 (discriminator 1)) 
start_secondary (arch/x86/kernel/smpboot.c:215 arch/x86/kernel/smpboot.c:249) 
secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:358) 
</TASK>
---[ end trace 0000000000000000 ]---

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ