[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241030180041.GGZyJ0SXfa73Q7NmwF@fat_crate.local>
Date: Wed, 30 Oct 2024 19:01:50 +0100
From: Borislav Petkov <bp@...en8.de>
To: Yazen Ghannam <yazen.ghannam@....com>
Cc: Avadhut Naik <avadhut.naik@....com>, x86@...nel.org,
linux-edac@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
linux-kernel@...r.kernel.org, tony.luck@...el.com,
qiuxu.zhuo@...el.com, tglx@...utronix.de, mingo@...hat.com,
rostedt@...dmis.org, mchehab@...nel.org, john.allen@....com
Subject: Re: [PATCH v7 5/5] EDAC/mce_amd: Add support for FRU Text in MCA
On Wed, Oct 30, 2024 at 05:50:02PM +0100, Borislav Petkov wrote:
> Bah, crap. Lemme go back and take a second stab at this.
Second try.
The reason why I don't want to expose MCA_CONFIG to userspace is, well,
userspace doesn't need to know any "management" information the hw gives. It
either gets FRU text in that tracepoint or it doesn't. But it doesn't need to
know what MCA_CONFIG said or didn't say.
Ok?
Author: Yazen Ghannam <yazen.ghannam@....com>
Date: Tue Oct 22 19:36:31 2024 +0000
EDAC/mce_amd: Add support for FRU text in MCA
A new "FRU Text in MCA" feature is defined where the Field Replaceable
Unit (FRU) Text for a device is represented by a string in the new
MCA_SYND1 and MCA_SYND2 registers. This feature is supported per MCA
bank, and it is advertised by the McaFruTextInMca bit (MCA_CONFIG[9]).
The FRU Text is populated dynamically for each individual error state
(MCA_STATUS, MCA_ADDR, et al.). Handle the case where an MCA bank covers
multiple devices, for example, a Unified Memory Controller (UMC) bank
that manages two DIMMs.
[ Yazen: Add Avadhut as co-developer for wrapper changes. ]
[ bp: Do not expose MCA_CONFIG to userspace yet. ]
Signed-off-by: Yazen Ghannam <yazen.ghannam@....com>
Co-developed-by: Avadhut Naik <avadhut.naik@....com>
Signed-off-by: Avadhut Naik <avadhut.naik@....com>
Signed-off-by: Borislav Petkov (AMD) <bp@...en8.de>
Link: https://lore.kernel.org/r/20241022194158.110073-6-avadhut.naik@amd.com
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 4d936ee20e24..4543cf2eb5e8 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -61,6 +61,7 @@
* - TCC bit is present in MCx_STATUS.
*/
#define MCI_CONFIG_MCAX 0x1
+#define MCI_CONFIG_FRUTEXT BIT_ULL(9)
#define MCI_IPID_MCATYPE 0xFFFF0000
#define MCI_IPID_HWID 0xFFF
diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index 194d9fd47d20..50d74d3bf0f5 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -795,6 +795,7 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
struct mce *m = (struct mce *)data;
struct mce_hw_err *err = to_mce_hw_err(m);
unsigned int fam = x86_family(m->cpuid);
+ u32 mca_config_lo = 0, dummy;
int ecc;
if (m->kflags & MCE_HANDLED_CEC)
@@ -814,11 +815,9 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
((m->status & MCI_STATUS_PCC) ? "PCC" : "-"));
if (boot_cpu_has(X86_FEATURE_SMCA)) {
- u32 low, high;
- u32 addr = MSR_AMD64_SMCA_MCx_CONFIG(m->bank);
+ rdmsr_safe(MSR_AMD64_SMCA_MCx_CONFIG(m->bank), &mca_config_lo, &dummy);
- if (!rdmsr_safe(addr, &low, &high) &&
- (low & MCI_CONFIG_MCAX))
+ if (mca_config_lo & MCI_CONFIG_MCAX)
pr_cont("|%s", ((m->status & MCI_STATUS_TCC) ? "TCC" : "-"));
pr_cont("|%s", ((m->status & MCI_STATUS_SYNDV) ? "SyndV" : "-"));
@@ -853,8 +852,15 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
if (m->status & MCI_STATUS_SYNDV) {
pr_cont(", Syndrome: 0x%016llx\n", m->synd);
- pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: 0x%016llx",
- err->vendor.amd.synd1, err->vendor.amd.synd2);
+ if (mca_config_lo & MCI_CONFIG_FRUTEXT) {
+ char frutext[17];
+
+ frutext[16] = '\0';
+ memcpy(&frutext[0], &err->vendor.amd.synd1, 8);
+ memcpy(&frutext[8], &err->vendor.amd.synd2, 8);
+
+ pr_emerg(HW_ERR "FRU Text: %s", frutext);
+ }
}
pr_cont("\n");
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Powered by blists - more mailing lists