lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ab54f94827d200ac8a05b4ee180895b0cbd55014.camel@kernel.crashing.org>
Date: Sat, 26 Oct 2024 10:26:15 +1100
From: Benjamin Herrenschmidt <benh@...nel.crashing.org>
To: Kuniyuki Iwashima <kuniyu@...zon.com>, x86@...nel.org,
        linux-edac@...r.kernel.org, linux-kernel@...r.kernel.org
Cc: Tony Luck <tony.luck@...el.com>, Borislav Petkov <bp@...en8.de>,
        Thomas
 Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
        Dave Hansen
	 <dave.hansen@...ux.intel.com>,
        "H. Peter Anvin" <hpa@...or.com>
Subject: Re: WARNING in lmce_supported() during reboot.

On Fri, 2024-10-25 at 16:13 -0700, Kuniyuki Iwashima wrote:
> Hello x86 maintainers,
> 
> We have seen the splat below few times when just rebooting hosts.
> 
> It rarely happens and seems a timing related, so we don't have a
> reproducer.
> 
> Our kernel source in the splat is here,
> https://github.com/amazonlinux/linux/tree/kernel-6.1.61-85.141.amzn2023
> 
> and the triggered WARN_ON_ONCE() in lmce_supported() is here.
> https://github.com/amazonlinux/linux/blob/kernel-6.1.61-85.141.amzn2023/arch/x86/kernel/cpu/mce/intel.c#L124

(switching to my lkml/spam friendly email)

I also hit it with 6.1.112-122.189.amzn2023.x86_64

Cheers,
Ben.

> Do you have any hint ?
> 
> Thanks in advance.
> 
> 
> ACPI: PM: Preparing to enter system sleep state S5
> reboot: Restarting system
> reboot: machine restart
> ------------[ cut here ]------------
> WARNING: CPU: 1 PID: 0 at arch/x86/kernel/cpu/mce/intel.c:124
> lmce_supported (arch/x86/kernel/cpu/mce/intel.c:124
> arch/x86/kernel/cpu/mce/intel.c:99) 
> Modules linked in: ib_core binfmt_misc ext4 crc16 mbcache jbd2 sunrpc
> mousedev atkbd psmouse ghash_clmulni_intel vivaldi_fmap libps2
> aesni_intel crypto_simd cryptd i8042 serio ena button sch_fq_codel
> dm_mod fuse configfs dax loop dmi_sysfs simpledrm drm_shmem_helper
> drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect
> sysimgblt fb_sys_fops cfbcopyarea drm i2c_core
> drm_panel_orientation_quirks backlight fb crc32_pclmul crc32c_intel
> fbdev efivarfs
> Hardware name: Amazon EC2 c6i.4xlarge/, BIOS 1.0 10/16/2017
> RIP: 0010:lmce_supported (arch/x86/kernel/cpu/mce/intel.c:124
> arch/x86/kernel/cpu/mce/intel.c:99) 
> Code: 81 fb 00 00 00 09 75 da b9 3a 00 00 00 0f 32 48 c1 e2 20 48 09
> c2 48 89 d3 66 90 48 89 d8 48 c1 e8 14 83 e0 01 83 e3 01 75 ba <0f>
> 0b 31 c0 eb b4 31 d2 48 89 de bf 3a 00 00 00 e8 6b e6 57 00 eb
> All code
> ========
>    0:	81 fb 00 00 00 09    	cmp    $0x9000000,%ebx
>    6:	75 da                	jne    0xffffffffffffffe2
>    8:	b9 3a 00 00 00       	mov    $0x3a,%ecx
>    d:	0f 32                	rdmsr
>    f:	48 c1 e2 20          	shl    $0x20,%rdx
>   13:	48 09 c2             	or     %rax,%rdx
>   16:	48 89 d3             	mov    %rdx,%rbx
>   19:	66 90                	xchg   %ax,%ax
>   1b:	48 89 d8             	mov    %rbx,%rax
>   1e:	48 c1 e8 14          	shr    $0x14,%rax
>   22:	83 e0 01             	and    $0x1,%eax
>   25:	83 e3 01             	and    $0x1,%ebx
>   28:	75 ba                	jne    0xffffffffffffffe4
>   2a:*	0f 0b                	ud2		<-- trapping
> instruction
>   2c:	31 c0                	xor    %eax,%eax
>   2e:	eb b4                	jmp    0xffffffffffffffe4
>   30:	31 d2                	xor    %edx,%edx
>   32:	48 89 de             	mov    %rbx,%rsi
>   35:	bf 3a 00 00 00       	mov    $0x3a,%edi
>   3a:	e8 6b e6 57 00       	call   0x57e6aa
>   3f:	eb                   	.byte 0xeb
> 
> Code starting with the faulting instruction
> ===========================================
>    0:	0f 0b                	ud2
>    2:	31 c0                	xor    %eax,%eax
>    4:	eb b4                	jmp    0xffffffffffffffba
>    6:	31 d2                	xor    %edx,%edx
>    8:	48 89 de             	mov    %rbx,%rsi
>    b:	bf 3a 00 00 00       	mov    $0x3a,%edi
>   10:	e8 6b e6 57 00       	call   0x57e680
>   15:	eb                   	.byte 0xeb
> RSP: 0018:ffffa18f00154fb8 EFLAGS: 00010046
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000000000003a
> RDX: 0000000000000000 RSI: 00000000000000ff RDI: ffff965cfe2599c0
> RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: ffffa18f00154ff8 R12: 0000000000000001
> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> FS:  0000000000000000(0000) GS:ffff965cfe240000(0000)
> knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f8485dfba30 CR3: 0000000389a10003 CR4: 00000000007706e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> PKRU: 55555554
> Call Trace:
> <IRQ>
> ? show_trace_log_lvl (arch/x86/kernel/dumpstack.c:259) 
> ? show_trace_log_lvl (arch/x86/kernel/dumpstack.c:259) 
> ? mce_intel_feature_clear (arch/x86/kernel/cpu/mce/intel.c:465
> arch/x86/kernel/cpu/mce/intel.c:502) 
> ? lmce_supported (arch/x86/kernel/cpu/mce/intel.c:124
> arch/x86/kernel/cpu/mce/intel.c:99) 
> ? __warn (kernel/panic.c:672) 
> ? lmce_supported (arch/x86/kernel/cpu/mce/intel.c:124
> arch/x86/kernel/cpu/mce/intel.c:99) 
> ? report_bug (lib/bug.c:201 lib/bug.c:219) 
> ? handle_bug (arch/x86/kernel/traps.c:324) 
> ? exc_invalid_op (arch/x86/kernel/traps.c:345 (discriminator 1)) 
> ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:568) 
> ? lmce_supported (arch/x86/kernel/cpu/mce/intel.c:124
> arch/x86/kernel/cpu/mce/intel.c:99) 
> ? clear_local_APIC (./arch/x86/include/asm/apic.h:393
> arch/x86/kernel/apic/apic.c:1192) 
> mce_intel_feature_clear (arch/x86/kernel/cpu/mce/intel.c:465
> arch/x86/kernel/cpu/mce/intel.c:502) 
> stop_this_cpu (arch/x86/kernel/process.c:780) 
> __sysvec_reboot (arch/x86/kernel/smp.c:140) 
> sysvec_reboot (arch/x86/kernel/smp.c:136 (discriminator 14)) 
> </IRQ>
> <TASK>
> asm_sysvec_reboot (./arch/x86/include/asm/idtentry.h:656) 
> RIP: 0010:acpi_idle_do_entry (./arch/x86/include/asm/irqflags.h:40
> ./arch/x86/include/asm/irqflags.h:75
> drivers/acpi/processor_idle.c:113 drivers/acpi/processor_idle.c:572) 
> Code: 75 08 48 8b 15 b1 81 df 02 ed c3 cc cc cc cc 65 48 8b 04 25 00
> ff 01 00 48 8b 00 a8 08 75 eb 66 90 0f 00 2d 58 c8 6a 00 fb f4 <fa>
> c3 cc cc cc cc e9 01 fc ff ff 90 0f 1f 44 00 00 41 56 41 55 41
> All code
> ========
>    0:	75 08                	jne    0xa
>    2:	48 8b 15 b1 81 df 02 	mov    0x2df81b1(%rip),%rdx        #
> 0x2df81ba
>    9:	ed                   	in     (%dx),%eax
>    a:	c3                   	ret
>    b:	cc                   	int3
>    c:	cc                   	int3
>    d:	cc                   	int3
>    e:	cc                   	int3
>    f:	65 48 8b 04 25 00 ff 	mov    %gs:0x1ff00,%rax
>   16:	01 00 
>   18:	48 8b 00             	mov    (%rax),%rax
>   1b:	a8 08                	test   $0x8,%al
>   1d:	75 eb                	jne    0xa
>   1f:	66 90                	xchg   %ax,%ax
>   21:	0f 00 2d 58 c8 6a 00 	verw   0x6ac858(%rip)        #
> 0x6ac880
>   28:	fb                   	sti
>   29:	f4                   	hlt
>   2a:*	fa                   	cli		<-- trapping
> instruction
>   2b:	c3                   	ret
>   2c:	cc                   	int3
>   2d:	cc                   	int3
>   2e:	cc                   	int3
>   2f:	cc                   	int3
>   30:	e9 01 fc ff ff       	jmp    0xfffffffffffffc36
>   35:	90                   	nop
>   36:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
>   3b:	41 56                	push   %r14
>   3d:	41 55                	push   %r13
>   3f:	41                   	rex.B
> 
> Code starting with the faulting instruction
> ===========================================
>    0:	fa                   	cli
>    1:	c3                   	ret
>    2:	cc                   	int3
>    3:	cc                   	int3
>    4:	cc                   	int3
>    5:	cc                   	int3
>    6:	e9 01 fc ff ff       	jmp    0xfffffffffffffc0c
>    b:	90                   	nop
>    c:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
>   11:	41 56                	push   %r14
>   13:	41 55                	push   %r13
>   15:	41                   	rex.B
> RSP: 0018:ffffa18f000afe70 EFLAGS: 00000246
> RAX: 0000000000004000 RBX: ffff965603d92400 RCX: 4000000000000000
> RDX: ffff965cfe240000 RSI: ffff965601478800 RDI: ffff965601478864
> RBP: 0000000000000001 R08: ffffffffb62182c0 R09: 0000000000000000
> R10: 0000000000002703 R11: 000000000001993d R12: 0000000000000001
> R13: ffffffffb6218340 R14: 0000000000000001 R15: 0000000000000000
> acpi_idle_enter (drivers/acpi/processor_idle.c:711 (discriminator 3))
> cpuidle_enter_state (drivers/cpuidle/cpuidle.c:239) 
> cpuidle_enter (drivers/cpuidle/cpuidle.c:358) 
> cpuidle_idle_call (kernel/sched/idle.c:240) 
> do_idle (kernel/sched/idle.c:305) 
> cpu_startup_entry (kernel/sched/idle.c:400 (discriminator 1)) 
> start_secondary (arch/x86/kernel/smpboot.c:215
> arch/x86/kernel/smpboot.c:249) 
> secondary_startup_64_no_verify (arch/x86/kernel/head_64.S:358) 
> </TASK>
> ---[ end trace 0000000000000000 ]---

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ