[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20250915010010.3547-1-spasswolf@web.de>
Date: Mon, 15 Sep 2025 03:00:09 +0200
From: Bert Karwatzki <spasswolf@....de>
To: Yazen Ghannam <yazen.ghannam@....com>
Cc: Bert Karwatzki <spasswolf@....de>,
Borislav Petkov <bp@...en8.de>,
Tony Luck <tony.luck@...el.com>,
linux-kernel@...r.kernel.org,
linux-next@...r.kernel.org,
linux-edac@...r.kernel.org,
linux-acpi@...r.kernel.org,
x86@...nel.org,
rafael@...nel.org,
qiuxu.zhuo@...el.com,
nik.borisov@...e.com,
Smita.KoralahalliChannabasappa@....com
Subject: spurious mce Hardware Error messages in next-20250912
On my MSI Alpha 15 (amd64) laptop running debian stable(trixie) and
kernel next-20250912 I noticed the following mce error message in demsg:
[ T10] mce: [Hardware Error]: Machine check events logged
[ T10] [Hardware Error]: Corrected error, no action required.
[ T10] [Hardware Error]: CPU:0 (19:50:0) MC11_STATUS[-|CE|-|AddrV|-|-|-|UECC|-|Poison|-]: 0x8400aa4800a90139
[ T10] [Hardware Error]: Error Addr: 0x006637a200000020
[ T10] [Hardware Error]: IPID: 0x000700b040000000
[ T10] [Hardware Error]: L3 Cache Ext. Error Code: 41
[ T10] [Hardware Error]: cache level: L1, tx: GEN, mem-tx: DRD
[ T10] mce: [Hardware Error]: Machine check events logged
[ T10] [Hardware Error]: Corrected error, no action required.
[ T10] [Hardware Error]: CPU:0 (19:50:0) MC14_STATUS[-|CE|-|AddrV|PCC|-|SyndV|UECC|-|Poison|-]: 0x8724ac0800000000
[ T10] [Hardware Error]: Error Addr: 0x002bf52e00000020
[ T10] [Hardware Error]: IPID: 0x000700b040000000, Syndrome: 0x0000000000000042
[ T10]
[ T10] [Hardware Error]: L3 Cache Ext. Error Code: 0
[ T10] [Hardware Error]: cache level: RESV, tx: INSN
The messages start about 333.34s after boot and usually appear 327.68s appart
(Yes, these timings are reproducible!):
$ dmesg | grep mce
[ 333.338334] [ T10] mce: [Hardware Error]: Machine check events logged
[ 333.338354] [ T10] mce: [Hardware Error]: Machine check events logged
[ 661.018322] [ T10] mce: [Hardware Error]: Machine check events logged
[ 661.018347] [ T10] mce: [Hardware Error]: Machine check events logged
[ 988.698305] [ T10] mce: [Hardware Error]: Machine check events logged
[ 988.698329] [ T10] mce: [Hardware Error]: Machine check events logged
[ 1316.378283] [ T10] mce: [Hardware Error]: Machine check events logged
[ 1316.378311] [ T10] mce: [Hardware Error]: Machine check events logged
[ 1644.058284] [ T10] mce: [Hardware Error]: Machine check events logged
[ 1644.058303] [ T10] mce: [Hardware Error]: Machine check events logged
As these messages do not appear in v6.17-rc5 I bisected the issue
(from v6.17-rc5 to next-20250912) and found this as the first bad commit:
cf6f155e848b ("x86/mce: Unify AMD DFR handler with MCA Polling")
Are these error messages a new error that was not reported previously or
are these error messages a sign that the new code erroneously reports errors?
Hardware used:
$ lspci
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne Root Complex
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne IOMMU
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge
00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe GPP Bridge
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge
00:02.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge
00:02.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge
00:02.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge
00:02.4 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne PCIe GPP Bridge
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Renoir PCIe Dummy Host Bridge
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Renoir Internal PCIe GPP Bridge to Bus
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 51)
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 5
00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 6
00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Cezanne Data Fabric; Function 7
01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev c3)
02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch
03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 [Radeon RX 6600/6600 XT/6600M] (rev c3)
03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP Audio Controller
04:00.0 Network controller: MEDIATEK Corp. MT7921K (RZ608) Wi-Fi 6E 80MHz
05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet Controller (rev 15)
06:00.0 Non-Volatile memory controller: Kingston Technology Company, Inc. KC3000/FURY Renegade NVMe SSD [E18] (rev 01)
07:00.0 Non-Volatile memory controller: Micron/Crucial Technology P1 NVMe PCIe SSD[Frampton] (rev 03)
08:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] (rev c5)
08:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Renoir Radeon High Definition Audio Controller
08:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 10h-1fh) Platform Security Processor
08:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1
08:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Renoir/Cezanne USB 3.1
08:00.5 Multimedia controller: Advanced Micro Devices, Inc. [AMD] Audio Coprocessor (rev 01)
08:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Family 17h/19h/1ah HD Audio Controller
08:00.7 Signal processing controller: Advanced Micro Devices, Inc. [AMD] Sensor Fusion Hub
$ cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 25
model : 80
model name : AMD Ryzen 7 5800H with Radeon Graphics
stepping : 0
microcode : 0xa50000c
cpu MHz : 2145.090
cache size : 512 KB
physical id : 0
siblings : 16
core id : 0
cpu cores : 8
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 16
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm debug_swap
bugs : sysret_ss_attrs spectre_v1 spectre_v2 spec_store_bypass srso ibpb_no_ret spectre_v2_user tsa
bogomips : 6388.44
TLB size : 2560 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 48 bits physical, 48 bits virtual
power management: ts ttp tm hwpstate cpb eff_freq_ro [13] [14]
Bert Karwatzki
Powered by blists - more mailing lists