lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241030204601.GA1505849@yaz-khff2.amd.com>
Date: Wed, 30 Oct 2024 16:46:01 -0400
From: Yazen Ghannam <yazen.ghannam@....com>
To: "Naik, Avadhut" <avadnaik@....com>, bp@...en.de
Cc: Borislav Petkov <bp@...en8.de>, Avadhut Naik <avadhut.naik@....com>,
	x86@...nel.org, linux-edac@...r.kernel.org,
	linux-trace-kernel@...r.kernel.org, linux-kernel@...r.kernel.org,
	tony.luck@...el.com, qiuxu.zhuo@...el.com, tglx@...utronix.de,
	mingo@...hat.com, rostedt@...dmis.org, mchehab@...nel.org,
	john.allen@....com
Subject: Re: [PATCH v7 5/5] EDAC/mce_amd: Add support for FRU Text in MCA

On Wed, Oct 30, 2024 at 02:57:33PM -0500, Naik, Avadhut wrote:
> 
> 
> On 10/30/2024 13:01, Borislav Petkov wrote:
> > On Wed, Oct 30, 2024 at 05:50:02PM +0100, Borislav Petkov wrote:
> >> Bah, crap. Lemme go back and take a second stab at this.
> > 
> > Second try.
> > 
> > The reason why I don't want to expose MCA_CONFIG to userspace is, well,
> > userspace doesn't need to know any "management" information the hw gives. It
> > either gets FRU text in that tracepoint or it doesn't. But it doesn't need to
> > know what MCA_CONFIG said or didn't say.
> > 
> > Ok?
> > 
> So, for now, in the kernel, we log SYND1/2 registers only when they contain
> FRUText.
> While in the userspace, since MCA_CONFIG is not in the picture, we always
> interpret SYND1/2 data as FRUText.
> Rasdaemon might need to be tweaked accordingly. Will take care of it.
> Overall, sounds good.
>

Sounds good to me too.

Thanks,
Yazen

> Do you want me send out a revised version with these changes?
> 
> > Author: Yazen Ghannam <yazen.ghannam@....com>
> > Date:   Tue Oct 22 19:36:31 2024 +0000
> > 
> >     EDAC/mce_amd: Add support for FRU text in MCA
> >     
> >     A new "FRU Text in MCA" feature is defined where the Field Replaceable
> >     Unit (FRU) Text for a device is represented by a string in the new
> >     MCA_SYND1 and MCA_SYND2 registers. This feature is supported per MCA
> >     bank, and it is advertised by the McaFruTextInMca bit (MCA_CONFIG[9]).
> >     
> >     The FRU Text is populated dynamically for each individual error state
> >     (MCA_STATUS, MCA_ADDR, et al.). Handle the case where an MCA bank covers
> >     multiple devices, for example, a Unified Memory Controller (UMC) bank
> >     that manages two DIMMs.
> >     
> >       [ Yazen: Add Avadhut as co-developer for wrapper changes. ]
> >       [ bp: Do not expose MCA_CONFIG to userspace yet. ]
> >     
> >     Signed-off-by: Yazen Ghannam <yazen.ghannam@....com>
> >     Co-developed-by: Avadhut Naik <avadhut.naik@....com>
> >     Signed-off-by: Avadhut Naik <avadhut.naik@....com>
> >     Signed-off-by: Borislav Petkov (AMD) <bp@...en8.de>
> >     Link: https://lore.kernel.org/r/20241022194158.110073-6-avadhut.naik@amd.com
> > 
> > diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> > index 4d936ee20e24..4543cf2eb5e8 100644
> > --- a/arch/x86/include/asm/mce.h
> > +++ b/arch/x86/include/asm/mce.h
> > @@ -61,6 +61,7 @@
> >   *  - TCC bit is present in MCx_STATUS.
> >   */
> >  #define MCI_CONFIG_MCAX		0x1
> > +#define MCI_CONFIG_FRUTEXT	BIT_ULL(9)
> >  #define MCI_IPID_MCATYPE	0xFFFF0000
> >  #define MCI_IPID_HWID		0xFFF
> >  
> > diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
> > index 194d9fd47d20..50d74d3bf0f5 100644
> > --- a/drivers/edac/mce_amd.c
> > +++ b/drivers/edac/mce_amd.c
> > @@ -795,6 +795,7 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
> >  	struct mce *m = (struct mce *)data;
> >  	struct mce_hw_err *err = to_mce_hw_err(m);
> >  	unsigned int fam = x86_family(m->cpuid);
> > +	u32 mca_config_lo = 0, dummy;
> >  	int ecc;
> >  
> >  	if (m->kflags & MCE_HANDLED_CEC)
> > @@ -814,11 +815,9 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
> >  		((m->status & MCI_STATUS_PCC)	? "PCC"	  : "-"));
> >  
> >  	if (boot_cpu_has(X86_FEATURE_SMCA)) {
> > -		u32 low, high;
> > -		u32 addr = MSR_AMD64_SMCA_MCx_CONFIG(m->bank);
> > +		rdmsr_safe(MSR_AMD64_SMCA_MCx_CONFIG(m->bank), &mca_config_lo, &dummy);
> >  
> > -		if (!rdmsr_safe(addr, &low, &high) &&
> > -		    (low & MCI_CONFIG_MCAX))
> > +		if (mca_config_lo & MCI_CONFIG_MCAX)
> >  			pr_cont("|%s", ((m->status & MCI_STATUS_TCC) ? "TCC" : "-"));
> >  
> >  		pr_cont("|%s", ((m->status & MCI_STATUS_SYNDV) ? "SyndV" : "-"));
> > @@ -853,8 +852,15 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
> >  
> >  		if (m->status & MCI_STATUS_SYNDV) {
> >  			pr_cont(", Syndrome: 0x%016llx\n", m->synd);
> > -			pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: 0x%016llx",
> > -				 err->vendor.amd.synd1, err->vendor.amd.synd2);
> > +			if (mca_config_lo & MCI_CONFIG_FRUTEXT) {
> > +				char frutext[17];
> > +
> > +				frutext[16] = '\0';
> > +				memcpy(&frutext[0], &err->vendor.amd.synd1, 8);
> > +				memcpy(&frutext[8], &err->vendor.amd.synd2, 8);
> > +
> > +				pr_emerg(HW_ERR "FRU Text: %s", frutext);
> > +			}
> >  		}
> >  
> >  		pr_cont("\n");
> > 
> 
> -- 
> Thanks,
> Avadhut Naik

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ