[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241025231320.45417-1-kuniyu@amazon.com>
Date: Fri, 25 Oct 2024 16:13:20 -0700
From: Kuniyuki Iwashima <kuniyu@...zon.com>
To: <x86@...nel.org>, <linux-edac@...r.kernel.org>,
<linux-kernel@...r.kernel.org>
CC: Tony Luck <tony.luck@...el.com>, Borislav Petkov <bp@...en8.de>, "Thomas
Gleixner" <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, Dave Hansen
<dave.hansen@...ux.intel.com>, "H. Peter Anvin" <hpa@...or.com>, "Benjamin
Herrenschmidt" <benh@...zon.com>, Kuniyuki Iwashima <kuniyu@...zon.com>
Subject: WARNING in lmce_supported() during reboot.
Hello x86 maintainers,
We have seen the splat below few times when just rebooting hosts.
It rarely happens and seems a timing related, so we don't have a
reproducer.
Our kernel source in the splat is here,
https://github.com/amazonlinux/linux/tree/kernel-6.1.61-85.141.amzn2023
and the triggered WARN_ON_ONCE() in lmce_supported() is here.
https://github.com/amazonlinux/linux/blob/kernel-6.1.61-85.141.amzn2023/arch/x86/kernel/cpu/mce/intel.c#L124
Do you have any hint ?
Thanks in advance.
ACPI: PM: Preparing to enter system sleep state S5
reboot: Restarting system
reboot: machine restart
------------[ cut here ]------------
WARNING: CPU: 1 PID: 0 at arch/x86/kernel/cpu/mce/intel.c:124 lmce_supported (arch/x86/kernel/cpu/mce/intel.c:124 arch/x86/kernel/cpu/mce/intel.c:99)
Modules linked in: ib_core binfmt_misc ext4 crc16 mbcache jbd2 sunrpc mousedev atkbd psmouse ghash_clmulni_intel vivaldi_fmap libps2 aesni_intel crypto_simd cryptd i8042 serio ena button sch_fq_codel dm_mod fuse configfs dax loop dmi_sysfs simpledrm drm_shmem_helper drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea drm i2c_core drm_panel_orientation_quirks backlight fb crc32_pclmul crc32c_intel fbdev efivarfs
Hardware name: Amazon EC2 c6i.4xlarge/, BIOS 1.0 10/16/2017
RIP: 0010:lmce_supported (arch/x86/kernel/cpu/mce/intel.c:124 arch/x86/kernel/cpu/mce/intel.c:99)
Code: 81 fb 00 00 00 09 75 da b9 3a 00 00 00 0f 32 48 c1 e2 20 48 09 c2 48 89 d3 66 90 48 89 d8 48 c1 e8 14 83 e0 01 83 e3 01 75 ba <0f> 0b 31 c0 eb b4 31 d2 48 89 de bf 3a 00 00 00 e8 6b e6 57 00 eb
All code
========
0: 81 fb 00 00 00 09 cmp $0x9000000,%ebx
6: 75 da jne 0xffffffffffffffe2
8: b9 3a 00 00 00 mov $0x3a,%ecx
d: 0f 32 rdmsr
f: 48 c1 e2 20 shl $0x20,%rdx
13: 48 09 c2 or %rax,%rdx
16: 48 89 d3 mov %rdx,%rbx
19: 66 90 xchg %ax,%ax
1b: 48 89 d8 mov %rbx,%rax
1e: 48 c1 e8 14 shr $0x14,%rax
22: 83 e0 01 and $0x1,%eax
25: 83 e3 01 and $0x1,%ebx
28: 75 ba jne 0xffffffffffffffe4
2a:* 0f 0b ud2 <-- trapping instruction
2c: 31 c0 xor %eax,%eax
2e: eb b4 jmp 0xffffffffffffffe4
30: 31 d2 xor %edx,%edx
32: 48 89 de mov %rbx,%rsi
35: bf 3a 00 00 00 mov $0x3a,%edi
3a: e8 6b e6 57 00 call 0x57e6aa
3f: eb .byte 0xeb
Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: 31 c0 xor %eax,%eax
4: eb b4 jmp 0xffffffffffffffba
6: 31 d2 xor %edx,%edx
8: 48 89 de mov %rbx,%rsi
b: bf 3a 00 00 00 mov $0x3a,%edi
10: e8 6b e6 57 00 call 0x57e680
15: eb .byte 0xeb
RSP: 0018:ffffa18f00154fb8 EFLAGS: 00010046
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000000003a
RDX: 0000000000000000 RSI: 00000000000000ff RDI: ffff965cfe2599c0
RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: ffffa18f00154ff8 R12: 0000000000000001
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff965cfe240000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8485dfba30 CR3: 0000000389a10003 CR4: 00000000007706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<IRQ>
? show_trace_log_lvl (arch/x86/kernel/dumpstack.c:259)
? show_trace_log_lvl (arch/x86/kernel/dumpstack.c:259)
? mce_intel_feature_clear (arch/x86/kernel/cpu/mce/intel.c:465 arch/x86/kernel/cpu/mce/intel.c:502)
? lmce_supported (arch/x86/kernel/cpu/mce/intel.c:124 arch/x86/kernel/cpu/mce/intel.c:99)
? __warn (kernel/panic.c:672)
? lmce_supported (arch/x86/kernel/cpu/mce/intel.c:124 arch/x86/kernel/cpu/mce/intel.c:99)
? report_bug (lib/bug.c:201 lib/bug.c:219)
? handle_bug (arch/x86/kernel/traps.c:324)
? exc_invalid_op (arch/x86/kernel/traps.c:345 (discriminator 1))
? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:568)
? lmce_supported (arch/x86/kernel/cpu/mce/intel.c:124 arch/x86/kernel/cpu/mce/intel.c:99)
? clear_local_APIC (./arch/x86/include/asm/apic.h:393 arch/x86/kernel/apic/apic.c:1192)
mce_intel_feature_clear (arch/x86/kernel/cpu/mce/intel.c:465 arch/x86/kernel/cpu/mce/intel.c:502)
stop_this_cpu (arch/x86/kernel/process.c:780)
__sysvec_reboot (arch/x86/kernel/smp.c:140)
sysvec_reboot (arch/x86/kernel/smp.c:136 (discriminator 14))
</IRQ>
<TASK>
asm_sysvec_reboot (./arch/x86/include/asm/idtentry.h:656)
RIP: 0010:acpi_idle_do_entry (./arch/x86/include/asm/irqflags.h:40 ./arch/x86/include/asm/irqflags.h:75 drivers/acpi/processor_idle.c:113 drivers/acpi/processor_idle.c:572)
Code: 75 08 48 8b 15 b1 81 df 02 ed c3 cc cc cc cc 65 48 8b 04 25 00 ff 01 00 48 8b 00 a8 08 75 eb 66 90 0f 00 2d 58 c8 6a 00 fb f4 <fa> c3 cc cc cc cc e9 01 fc ff ff 90 0f 1f 44 00 00 41 56 41 55 41
All code
========
0: 75 08 jne 0xa
2: 48 8b 15 b1 81 df 02 mov 0x2df81b1(%rip),%rdx # 0x2df81ba
9: ed in (%dx),%eax
a: c3 ret
b: cc int3
c: cc int3
d: cc int3
e: cc int3
f: 65 48 8b 04 25 00 ff mov %gs:0x1ff00,%rax
16: 01 00
18: 48 8b 00 mov (%rax),%rax
1b: a8 08 test $0x8,%al
1d: 75 eb jne 0xa
1f: 66 90 xchg %ax,%ax
21: 0f 00 2d 58 c8 6a 00 verw 0x6ac858(%rip) # 0x6ac880
28: fb sti
29: f4 hlt
2a:* fa cli <-- trapping instruction
2b: c3 ret
2c: cc int3
2d: cc int3
2e: cc int3
2f: cc int3
30: e9 01 fc ff ff jmp 0xfffffffffffffc36
35: 90 nop
36: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
3b: 41 56 push %r14
3d: 41 55 push %r13
3f: 41 rex.B
Code starting with the faulting instruction
===========================================
0: fa cli
1: c3 ret
2: cc int3
3: cc int3
4: cc int3
5: cc int3
6: e9 01 fc ff ff jmp 0xfffffffffffffc0c
b: 90 nop
c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
11: 41 56 push %r14
13: 41 55 push %r13
15: 41 rex.B
RSP: 0018:ffffa18f000afe70 EFLAGS: 00000246
RAX: 0000000000004000 RBX: ffff965603d92400 RCX: 4000000000000000
RDX: ffff965cfe240000 RSI: ffff965601478800 RDI: ffff965601478864
RBP: 0000000000000001 R08: ffffffffb62182c0 R09: 0000000000000000
R10: 0000000000002703 R11: 000000000001993d R12: 0000000000000001
R13: ffffffffb6218340 R14: 0000000000000001 R15: 0000000000000000
acpi_idle_enter (drivers/acpi/processor_idle.c:711 (discriminator 3))
cpuidle_enter_state (drivers/cpuidle/cpuidle.c:239)
cpuidle_enter (drivers/cpuidle/cpuidle.c:358)
cpuidle_idle_call (kernel/sched/idle.c:240)
do_idle (kernel/sched/idle.c:305)
cpu_startup_entry (kernel/sched/idle.c:400 (discriminator 1))
start_secondary (arch/x86/kernel/smpboot.c:215 arch/x86/kernel/smpboot.c:249)
secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:358)
</TASK>
---[ end trace 0000000000000000 ]---
Powered by blists - more mailing lists