lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241030180041.GGZyJ0SXfa73Q7NmwF@fat_crate.local>
Date: Wed, 30 Oct 2024 19:01:50 +0100
From: Borislav Petkov <bp@...en8.de>
To: Yazen Ghannam <yazen.ghannam@....com>
Cc: Avadhut Naik <avadhut.naik@....com>, x86@...nel.org,
	linux-edac@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
	linux-kernel@...r.kernel.org, tony.luck@...el.com,
	qiuxu.zhuo@...el.com, tglx@...utronix.de, mingo@...hat.com,
	rostedt@...dmis.org, mchehab@...nel.org, john.allen@....com
Subject: Re: [PATCH v7 5/5] EDAC/mce_amd: Add support for FRU Text in MCA

On Wed, Oct 30, 2024 at 05:50:02PM +0100, Borislav Petkov wrote:
> Bah, crap. Lemme go back and take a second stab at this.

Second try.

The reason why I don't want to expose MCA_CONFIG to userspace is, well,
userspace doesn't need to know any "management" information the hw gives. It
either gets FRU text in that tracepoint or it doesn't. But it doesn't need to
know what MCA_CONFIG said or didn't say.

Ok?

Author: Yazen Ghannam <yazen.ghannam@....com>
Date:   Tue Oct 22 19:36:31 2024 +0000

    EDAC/mce_amd: Add support for FRU text in MCA
    
    A new "FRU Text in MCA" feature is defined where the Field Replaceable
    Unit (FRU) Text for a device is represented by a string in the new
    MCA_SYND1 and MCA_SYND2 registers. This feature is supported per MCA
    bank, and it is advertised by the McaFruTextInMca bit (MCA_CONFIG[9]).
    
    The FRU Text is populated dynamically for each individual error state
    (MCA_STATUS, MCA_ADDR, et al.). Handle the case where an MCA bank covers
    multiple devices, for example, a Unified Memory Controller (UMC) bank
    that manages two DIMMs.
    
      [ Yazen: Add Avadhut as co-developer for wrapper changes. ]
      [ bp: Do not expose MCA_CONFIG to userspace yet. ]
    
    Signed-off-by: Yazen Ghannam <yazen.ghannam@....com>
    Co-developed-by: Avadhut Naik <avadhut.naik@....com>
    Signed-off-by: Avadhut Naik <avadhut.naik@....com>
    Signed-off-by: Borislav Petkov (AMD) <bp@...en8.de>
    Link: https://lore.kernel.org/r/20241022194158.110073-6-avadhut.naik@amd.com

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 4d936ee20e24..4543cf2eb5e8 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -61,6 +61,7 @@
  *  - TCC bit is present in MCx_STATUS.
  */
 #define MCI_CONFIG_MCAX		0x1
+#define MCI_CONFIG_FRUTEXT	BIT_ULL(9)
 #define MCI_IPID_MCATYPE	0xFFFF0000
 #define MCI_IPID_HWID		0xFFF
 
diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index 194d9fd47d20..50d74d3bf0f5 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -795,6 +795,7 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
 	struct mce *m = (struct mce *)data;
 	struct mce_hw_err *err = to_mce_hw_err(m);
 	unsigned int fam = x86_family(m->cpuid);
+	u32 mca_config_lo = 0, dummy;
 	int ecc;
 
 	if (m->kflags & MCE_HANDLED_CEC)
@@ -814,11 +815,9 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
 		((m->status & MCI_STATUS_PCC)	? "PCC"	  : "-"));
 
 	if (boot_cpu_has(X86_FEATURE_SMCA)) {
-		u32 low, high;
-		u32 addr = MSR_AMD64_SMCA_MCx_CONFIG(m->bank);
+		rdmsr_safe(MSR_AMD64_SMCA_MCx_CONFIG(m->bank), &mca_config_lo, &dummy);
 
-		if (!rdmsr_safe(addr, &low, &high) &&
-		    (low & MCI_CONFIG_MCAX))
+		if (mca_config_lo & MCI_CONFIG_MCAX)
 			pr_cont("|%s", ((m->status & MCI_STATUS_TCC) ? "TCC" : "-"));
 
 		pr_cont("|%s", ((m->status & MCI_STATUS_SYNDV) ? "SyndV" : "-"));
@@ -853,8 +852,15 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
 
 		if (m->status & MCI_STATUS_SYNDV) {
 			pr_cont(", Syndrome: 0x%016llx\n", m->synd);
-			pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: 0x%016llx",
-				 err->vendor.amd.synd1, err->vendor.amd.synd2);
+			if (mca_config_lo & MCI_CONFIG_FRUTEXT) {
+				char frutext[17];
+
+				frutext[16] = '\0';
+				memcpy(&frutext[0], &err->vendor.amd.synd1, 8);
+				memcpy(&frutext[8], &err->vendor.amd.synd2, 8);
+
+				pr_emerg(HW_ERR "FRU Text: %s", frutext);
+			}
 		}
 
 		pr_cont("\n");

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ