lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YqJHwXkg3Ny9fI3s@yaz-fattaah>
Date:   Thu, 9 Jun 2022 19:19:29 +0000
From:   Yazen Ghannam <yazen.ghannam@....com>
To:     "Luck, Tony" <tony.luck@...el.com>
Cc:     bp@...en8.de,
        Smita Koralahalli <Smita.KoralahalliChannabasappa@....com>,
        linux-edac@...r.kernel.org, linux-kernel@...r.kernel.org,
        x86@...nel.org, hpa@...or.com,
        Dave Hansen <dave.hansen@...ux.intel.com>
Subject: Re: [PATCH v5 2/2] x86/mce: Add support for Extended Physical
 Address MCA changes

On Fri, Apr 15, 2022 at 09:37:57AM -0700, Luck, Tony wrote:
> On Fri, Apr 15, 2022 at 02:56:54PM +0000, Yazen Ghannam wrote:
> > 3) OS, or optionally BIOS, polls MCA banks and logs any valid errors.
> >    a) Since MCi_CTL, etc. are cleared due to reset, any errors detected are
> >       from before the reset.
> 
> On Intel not quite any error. H/w can still log to a bank but MCi_STATUS.EN bit
> will be zero. We've also had some BIOS code that did things that logged errors
> and then left them for the OS to find during boot.
> 
> But this sequence does give more confidence that errors found in banks duing
> boot are "old".
> 
> > I agree. The Intel SDM and AMD APM have the following procedure, in summary.
> > 
> > 1) Set MCG_CTL
> > 2) Set MCi_CTL for all banks
> > 3) Read MCi_STATUS and log valid errors.
> > 4) Clear MCi_STATUS
> > 5) Set CR4.MCE
> 
> Yes. That's what the pseudo-code in Intel SDM Example 15-1 says :-(
> > 
> > I don't know of a reason why STATUS needs to be cleared after MCi_CTL is set.
> > The only thing I can think of is that enabling MCi_CTL may cause spurious info
> > logged in MCi_STATUS, and that needs to be cleared out. I'm asking AMD folks
> > about it.
> > 
> > Of course, this contradicts the flow I outlined above, and also the flow given
> > in the AMD Processor Programming Reference (PPR). I wonder if the
> > architectural documents have gotten stale compared to current guidelines. I'm
> > asking about this too.
> 
> I will ask architects about this sequence too.
>

Hi everyone,
It looks like the discrepancy between the Linux code and the x86 documents
isn't a major concern for AMD systems. However, it is highly recommended that
the banks are polled before enabling MCA to find any errors from before OS
boot. It is possible that BIOS may enable MCA before the OS on some systems,
but this isn't always the case.

Tony,
Did you get any feedback regarding the sequence above?

Also, please see the patch below which is based on Boris' patch from earlier
in this thread.

Thanks,
Yazen

-------

>From dc4f5b862080daae1aae22f1ec460d9c4c8b6d20 Mon Sep 17 00:00:00 2001
From: Yazen Ghannam <yazen.ghannam@....com>
Date: Thu, 19 May 2022 17:25:47 +0000
Subject: [PATCH] x86/mce: Remove __mcheck_cpu_init_early()

The __mcheck_cpu_init_early() function was introduced so that some
vendor-specific features are detected before the first MCA polling event
done in __mcheck_cpu_init_generic().

Currently, __mcheck_cpu_init_early() is only used on AMD-based systems and
additional code will be needed to support various system configurations.

However, the current and future vendor-specific code should be done during
vendor init. This keeps all the vendor code in a common location and
simplifies the generic init flow.

Move all the __mcheck_cpu_init_early() code into mce_amd_feature_init().
Also, move __mcheck_cpu_init_generic() after
__mcheck_cpu_init_prepare_banks() so that MCA is enabled after the first
MCA polling event.

Signed-off-by: Yazen Ghannam <yazen.ghannam@....com>
---
 arch/x86/kernel/cpu/mce/amd.c  |  4 ++++
 arch/x86/kernel/cpu/mce/core.c | 20 +++-----------------
 2 files changed, 7 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 1c87501e0fa3..f65224a2b02d 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -681,6 +681,10 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
 	u32 low = 0, high = 0, address = 0;
 	int offset = -1;
 
+	mce_flags.overflow_recov = !!cpu_has(c, X86_FEATURE_OVERFLOW_RECOV);
+	mce_flags.succor	 = !!cpu_has(c, X86_FEATURE_SUCCOR);
+	mce_flags.smca		 = !!cpu_has(c, X86_FEATURE_SMCA);
+	mce_flags.amd_threshold	 = 1;
 
 	for (bank = 0; bank < this_cpu_read(mce_num_banks); ++bank) {
 		if (mce_flags.smca)
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 5f406d135d32..9efd6d010e2d 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1906,19 +1906,6 @@ static int __mcheck_cpu_ancient_init(struct cpuinfo_x86 *c)
 	return 0;
 }
 
-/*
- * Init basic CPU features needed for early decoding of MCEs.
- */
-static void __mcheck_cpu_init_early(struct cpuinfo_x86 *c)
-{
-	if (c->x86_vendor == X86_VENDOR_AMD || c->x86_vendor == X86_VENDOR_HYGON) {
-		mce_flags.overflow_recov = !!cpu_has(c, X86_FEATURE_OVERFLOW_RECOV);
-		mce_flags.succor	 = !!cpu_has(c, X86_FEATURE_SUCCOR);
-		mce_flags.smca		 = !!cpu_has(c, X86_FEATURE_SMCA);
-		mce_flags.amd_threshold	 = 1;
-	}
-}
-
 static void mce_centaur_feature_init(struct cpuinfo_x86 *c)
 {
 	struct mca_config *cfg = &mca_cfg;
@@ -2139,10 +2126,9 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c)
 
 	mca_cfg.initialized = 1;
 
-	__mcheck_cpu_init_early(c);
-	__mcheck_cpu_init_generic();
 	__mcheck_cpu_init_vendor(c);
 	__mcheck_cpu_init_prepare_banks();
+	__mcheck_cpu_init_generic();
 	__mcheck_cpu_setup_timer();
 }
 
@@ -2308,9 +2294,9 @@ static void mce_syscore_shutdown(void)
  */
 static void mce_syscore_resume(void)
 {
-	__mcheck_cpu_init_generic();
 	__mcheck_cpu_init_vendor(raw_cpu_ptr(&cpu_info));
 	__mcheck_cpu_init_prepare_banks();
+	__mcheck_cpu_init_generic();
 }
 
 static struct syscore_ops mce_syscore_ops = {
@@ -2327,8 +2313,8 @@ static void mce_cpu_restart(void *data)
 {
 	if (!mce_available(raw_cpu_ptr(&cpu_info)))
 		return;
-	__mcheck_cpu_init_generic();
 	__mcheck_cpu_init_prepare_banks();
+	__mcheck_cpu_init_generic();
 	__mcheck_cpu_init_timer();
 }
 
-- 
2.25.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ