lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <be786e9a-0302-47be-b2a8-bfa4449c7ab7@suse.com>
Date: Tue, 13 Jan 2026 20:55:08 +0200
From: Nikolay Borisov <nik.borisov@...e.com>
To: "Luck, Tony" <tony.luck@...el.com>, Borislav Petkov <bp@...en8.de>,
 "Li, Rongqing" <lirongqing@...du.com>
Cc: Thomas Gleixner <tglx@...nel.org>, Ingo Molnar <mingo@...hat.com>,
 Dave Hansen <dave.hansen@...ux.intel.com>, "x86@...nel.org"
 <x86@...nel.org>, "H . Peter Anvin" <hpa@...or.com>,
 Yazen Ghannam <yazen.ghannam@....com>, "Zhuo, Qiuxu" <qiuxu.zhuo@...el.com>,
 Avadhut Naik <avadhut.naik@....com>,
 "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
 "linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>
Subject: Re: 答复: 答复: 答复: [外部邮件] Re: [PATCH] x86/mce: Fix timer interval adjustment after logging a MCE event



On 13.01.26 г. 20:53 ч., Luck, Tony wrote:
>>> The comment in mce_timer_fn says to adjust the polling interval, but
>>> I notice the kernel log always shows an MCE log every 5 minutes. Is this
>>> normal?
>>
>> Use git annotate to figure out which patch added this comment and in context
>> of what and that'll tell you why.
>>
>> As to the 5 minutes, look at how the check interval gets established.
> 
> Once upon a time the polling interval started out at 5 minutes, but the
> interval was halved each time an error was found (so interval went
> 150s, 75s, 37s, ... down to 1s). If no error was found, then the interval
> was doubled (going back up to 300s).
> 
> This is described in the comment:
> 
>          /*
>           * Alert userspace if needed. If we logged an MCE, reduce the polling
>           * interval, otherwise increase the polling interval.
>           */
> 
> It seems that the kernel isn't doing that today. Polling at a fixed 300 seconds
> event though errors are being found and logged. Interesting that the timestamps
> are 327.68 seconds apart, rather than 300 and change. So there is some strange
> stuff going on.
> 
> I can reproduce here on an Icelake system. Booted with mce=no_cmci to force polling
> (and turned of BIOS firmware first mode).  Injecting an error every 30 seconds I also see
> constant 327 seconds between logs (multiple logs show up because my injection picks memory
> channel "randomly", so there can be several banks with errors when polling happens).
> 
> $ dmesg | grep 'Machine Check Event:'
> [  662.632988] EDAC skx MC4: CPU 40: Machine Check Event: 0x0 Bank 13: 0x8c00014200800090
> [  662.727377] EDAC skx MC6: CPU 40: Machine Check Event: 0x0 Bank 21: 0x8c0000c200800090
> [  990.283484] EDAC skx MC4: CPU 121: Machine Check Event: 0x0 Bank 13: 0x8c00010200800090
> [  990.378233] EDAC skx MC6: CPU 121: Machine Check Event: 0x0 Bank 21: 0x8c00014200800090
> [  990.467199] EDAC skx MC0: CPU 3: Machine Check Event: 0x0 Bank 13: 0x8c00004200800090
> [ 1317.939260] EDAC skx MC4: CPU 122: Machine Check Event: 0x0 Bank 13: 0x8c00010200800090
> [ 1318.033721] EDAC skx MC6: CPU 122: Machine Check Event: 0x0 Bank 21: 0x8c00010200800090
> [ 1318.122612] EDAC skx MC0: CPU 14: Machine Check Event: 0x0 Bank 13: 0x8c00004200800090
> [ 1318.211507] EDAC skx MC2: CPU 14: Machine Check Event: 0x0 Bank 21: 0x8c00004200800090
> [ 1645.590773] EDAC skx MC4: CPU 129: Machine Check Event: 0x0 Bank 13: 0x8c00010200800090
> [ 1645.685153] EDAC skx MC6: CPU 129: Machine Check Event: 0x0 Bank 21: 0x8c00018200800090
> [ 1645.773744] EDAC skx MC0: CPU 100: Machine Check Event: 0x0 Bank 13: 0x8c00004200800090
> 
> -Tony
> 

At this stage I think lirongqi's patch is ok, but in the long run (i.e 
tomorrow) I will send a patch that simply eliminates mce_notify_irq's 
call in mce_timer_fn. I.e that function should be called only from the 
early notifier.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ