[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87ms2hi7cm.ffs@tglx>
Date: Tue, 13 Jan 2026 21:30:01 +0100
From: Thomas Gleixner <tglx@...nel.org>
To: "Luck, Tony" <tony.luck@...el.com>, Borislav Petkov <bp@...en8.de>, "Li,
Rongqing" <lirongqing@...du.com>
Cc: Nikolay Borisov <nik.borisov@...e.com>, Ingo Molnar <mingo@...hat.com>,
Dave Hansen <dave.hansen@...ux.intel.com>, "x86@...nel.org"
<x86@...nel.org>, "H . Peter Anvin" <hpa@...or.com>, Yazen
Ghannam <yazen.ghannam@....com>, "Zhuo, Qiuxu" <qiuxu.zhuo@...el.com>,
Avadhut Naik <avadhut.naik@....com>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "linux-edac@...r.kernel.org"
<linux-edac@...r.kernel.org>
Subject: RE: 答复: 答复: 答复: [外部邮件] Re: [PATCH] x86/mce: Fix timer interval
adjustment after logging a MCE event
On Tue, Jan 13 2026 at 18:53, Tony Luck wrote:
>> > The comment in mce_timer_fn says to adjust the polling interval, but
>> > I notice the kernel log always shows an MCE log every 5 minutes. Is this
>> > normal?
>>
>> Use git annotate to figure out which patch added this comment and in context
>> of what and that'll tell you why.
>>
>> As to the 5 minutes, look at how the check interval gets established.
>
> Once upon a time the polling interval started out at 5 minutes, but the
> interval was halved each time an error was found (so interval went
> 150s, 75s, 37s, ... down to 1s). If no error was found, then the interval
> was doubled (going back up to 300s).
>
> This is described in the comment:
>
> /*
> * Alert userspace if needed. If we logged an MCE, reduce the polling
> * interval, otherwise increase the polling interval.
> */
>
> It seems that the kernel isn't doing that today. Polling at a fixed 300 seconds
> event though errors are being found and logged.
How did we lose that?
> Interesting that the timestamps are 327.68 seconds apart, rather than
> 300 and change. So there is some strange stuff going on.
Nothing strange. That's the batching inaccuracy, aka. granularity of the
timer wheel. See the big fat comment on top of kernel/time/timer.c
So looking at that table, I'm sure you have HZ=250. But that granularity
does not explain why that interval magic is not longer working....
Thanks,
tglx
Powered by blists - more mailing lists