lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260113213158.GUaWa5zunSfuAzra0n@fat_crate.local>
Date: Tue, 13 Jan 2026 22:31:58 +0100
From: Borislav Petkov <bp@...en8.de>
To: "Luck, Tony" <tony.luck@...el.com>
Cc: "Li, Rongqing" <lirongqing@...du.com>,
	Nikolay Borisov <nik.borisov@...e.com>,
	Thomas Gleixner <tglx@...nel.org>, Ingo Molnar <mingo@...hat.com>,
	Dave Hansen <dave.hansen@...ux.intel.com>,
	"x86@...nel.org" <x86@...nel.org>,
	"H . Peter Anvin" <hpa@...or.com>,
	Yazen Ghannam <yazen.ghannam@....com>,
	"Zhuo, Qiuxu" <qiuxu.zhuo@...el.com>,
	Avadhut Naik <avadhut.naik@....com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>
Subject: Re: 答复: 答复: 答复: [外部邮件] Re: [PATCH] x86/mce: Fix timer interval adjustment after logging a MCE event

On Tue, Jan 13, 2026 at 09:05:01PM +0000, Luck, Tony wrote:
> >> $ dmesg | grep 'Machine Check Event:'
> >
> > Did you see the "Machine check events logged\n" print from mce_notify_irq() in
> > dmesg too?
> 
> Yes.  I used the other grep pattern to see detail of which CPU/bank logged the error.
> Same pattern of timestamps shows up with this grep too.

Yah, this confirms the flow:

mce_timer_fn()-> ... -> machine_check_poll -> mce_log which will queue the
work and return.

Now, back in mce_timer_fn:

        /*
         * Alert userspace if needed. If we logged an MCE, reduce the polling
         * interval, otherwise increase the polling interval.
         */
        if (mce_notify_irq())


<--- we haven't ran the notifier chain yet so mce_need_notify is not set yet
so this won't hit and we won't halve the interval. I need to verify that
empirically.

                iv = max(iv / 2, (unsigned long) HZ/100);
        else
                iv = min(iv * 2, round_jiffies_relative(check_interval * HZ));

And now the notifier chain runs. mce_early_notifier() sets the bit, does
mce_notify_irq(), that clears the bit and then the notifier chain a little
later (skx_edac) logs the error.

So this looks like a silly timing issue...

We could set mce_need_notify in mce_log(), zap this thing:

                if (__ratelimit(&ratelimit))
                        pr_info(HW_ERR "Machine check events logged\n");

in mce_notify_irq() or at least predicate it on the CEC being enabled and then
not call mce_notify_irq() in the notifier but leave it be called in the timer
function...

Ufff, how silly and overengineered we've made it. I need to think about
a cleaner solution tomorrow...

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ