[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YgFjrmFuc/b+0OCE@agluck-desk2.amr.corp.intel.com>
Date: Mon, 7 Feb 2022 10:23:42 -0800
From: "Luck, Tony" <tony.luck@...el.com>
To: Jue Wang <juew@...gle.com>
Cc: Borislav Petkov <bp@...en8.de>, x86@...nel.org,
linux-kernel@...r.kernel.org, patches@...ts.linux.dev
Subject: Re: [RFC] x86/mce: Add workaround for SKX/CLX/CPX spurious machine
checks
On Sun, Feb 06, 2022 at 08:36:40PM -0800, Jue Wang wrote:
> +static bool quirk_skylake_repmov(void)
> +{
> + /*
> + * State that represents if an SRAR MCE has already signaled on the DCU bank.
> + */
> + static DEFINE_PER_CPU(bool, srar_dcu_signaled);
> +
> + if (unlikely(!__this_cpu_read(srar_dcu_signaled))) {
> + u64 mc1_status = mce_rdmsrl(MSR_IA32_MCx_STATUS(1));
Jue,
When I reviewed this for you off-list, I didn't notice that you
dropped the test for mcgstatus & MCG_STATUS_LMCES as part of
moving to a helper function and expanding the test for more
bits in mc1_status.
I think that test still is still important ... knowing that this is
a *local* machine check before making decision based on just what this
CPU observes makes this a bit more robust.
> +
> + if (is_intel_srar(mc1_status)) {
> + __this_cpu_write(srar_dcu_signaled, true);
> + msr_clear_bit(MSR_IA32_MISC_ENABLE,
> + MSR_IA32_MISC_ENABLE_FAST_STRING_BIT);
> + mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
> + mce_wrmsrl(MSR_IA32_MCx_STATUS(1), 0);
> + pr_err("First SRAR MCE on DCU, CPU: %d, disable fast string copy.\n",
> + smp_processor_id());
> + return true;
> + }
> + }
> + return false;
> +}
-Tony
Powered by blists - more mailing lists