[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <SJ1PR11MB6083F2650A8DB801F0EF26C8FC8EA@SJ1PR11MB6083.namprd11.prod.outlook.com>
Date: Tue, 13 Jan 2026 18:53:13 +0000
From: "Luck, Tony" <tony.luck@...el.com>
To: Borislav Petkov <bp@...en8.de>, "Li, Rongqing" <lirongqing@...du.com>
CC: Nikolay Borisov <nik.borisov@...e.com>, Thomas Gleixner <tglx@...nel.org>,
Ingo Molnar <mingo@...hat.com>, Dave Hansen <dave.hansen@...ux.intel.com>,
"x86@...nel.org" <x86@...nel.org>, "H . Peter Anvin" <hpa@...or.com>, "Yazen
Ghannam" <yazen.ghannam@....com>, "Zhuo, Qiuxu" <qiuxu.zhuo@...el.com>,
Avadhut Naik <avadhut.naik@....com>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "linux-edac@...r.kernel.org"
<linux-edac@...r.kernel.org>
Subject: RE: 答复: 答复: 答复: [外部邮件] Re: [PATCH] x86/mce: Fix timer interval adjustment after logging a MCE event
> > The comment in mce_timer_fn says to adjust the polling interval, but
> > I notice the kernel log always shows an MCE log every 5 minutes. Is this
> > normal?
>
> Use git annotate to figure out which patch added this comment and in context
> of what and that'll tell you why.
>
> As to the 5 minutes, look at how the check interval gets established.
Once upon a time the polling interval started out at 5 minutes, but the
interval was halved each time an error was found (so interval went
150s, 75s, 37s, ... down to 1s). If no error was found, then the interval
was doubled (going back up to 300s).
This is described in the comment:
/*
* Alert userspace if needed. If we logged an MCE, reduce the polling
* interval, otherwise increase the polling interval.
*/
It seems that the kernel isn't doing that today. Polling at a fixed 300 seconds
event though errors are being found and logged. Interesting that the timestamps
are 327.68 seconds apart, rather than 300 and change. So there is some strange
stuff going on.
I can reproduce here on an Icelake system. Booted with mce=no_cmci to force polling
(and turned of BIOS firmware first mode). Injecting an error every 30 seconds I also see
constant 327 seconds between logs (multiple logs show up because my injection picks memory
channel "randomly", so there can be several banks with errors when polling happens).
$ dmesg | grep 'Machine Check Event:'
[ 662.632988] EDAC skx MC4: CPU 40: Machine Check Event: 0x0 Bank 13: 0x8c00014200800090
[ 662.727377] EDAC skx MC6: CPU 40: Machine Check Event: 0x0 Bank 21: 0x8c0000c200800090
[ 990.283484] EDAC skx MC4: CPU 121: Machine Check Event: 0x0 Bank 13: 0x8c00010200800090
[ 990.378233] EDAC skx MC6: CPU 121: Machine Check Event: 0x0 Bank 21: 0x8c00014200800090
[ 990.467199] EDAC skx MC0: CPU 3: Machine Check Event: 0x0 Bank 13: 0x8c00004200800090
[ 1317.939260] EDAC skx MC4: CPU 122: Machine Check Event: 0x0 Bank 13: 0x8c00010200800090
[ 1318.033721] EDAC skx MC6: CPU 122: Machine Check Event: 0x0 Bank 21: 0x8c00010200800090
[ 1318.122612] EDAC skx MC0: CPU 14: Machine Check Event: 0x0 Bank 13: 0x8c00004200800090
[ 1318.211507] EDAC skx MC2: CPU 14: Machine Check Event: 0x0 Bank 21: 0x8c00004200800090
[ 1645.590773] EDAC skx MC4: CPU 129: Machine Check Event: 0x0 Bank 13: 0x8c00010200800090
[ 1645.685153] EDAC skx MC6: CPU 129: Machine Check Event: 0x0 Bank 21: 0x8c00018200800090
[ 1645.773744] EDAC skx MC0: CPU 100: Machine Check Event: 0x0 Bank 13: 0x8c00004200800090
-Tony
Powered by blists - more mailing lists