lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240816140853.GB29375@yaz-khff2.amd.com>
Date: Fri, 16 Aug 2024 10:08:53 -0400
From: Yazen Ghannam <yazen.ghannam@....com>
To: Borislav Petkov <bp@...en8.de>
Cc: linux-edac@...r.kernel.org, linux-kernel@...r.kernel.org,
	tony.luck@...el.com, x86@...nel.org, avadhut.naik@....com,
	john.allen@....com
Subject: Re: [PATCH 7/9] x86/mce: Unify AMD DFR handler with MCA Polling

On Tue, Jun 04, 2024 at 01:05:28PM +0200, Borislav Petkov wrote:
> On Thu, May 23, 2024 at 10:56:39AM -0500, Yazen Ghannam wrote:
> > +static bool smca_log_poll_error(struct mce *m, u32 *status_reg)
> 
> That handing of *status_reg back'n'forth just to clear it in the end is
> not nice. Let's get rid of it:
> 
> ---
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 0a9cff329487..a0ba82fe6de3 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -669,7 +669,7 @@ static void reset_thr_limit(unsigned int bank)
>  
>  DEFINE_PER_CPU(unsigned, mce_poll_count);
>  
> -static bool smca_log_poll_error(struct mce *m, u32 *status_reg)
> +static bool smca_log_poll_error(struct mce *m, u32 status_reg)
>  {
>  	/*
>  	 * If this is a deferred error found in MCA_STATUS, then clear
> @@ -686,8 +686,8 @@ static bool smca_log_poll_error(struct mce *m, u32 *status_reg)
>  	 * If the MCA_DESTAT register has valid data, then use
>  	 * it as the status register.
>  	 */
> -	*status_reg = MSR_AMD64_SMCA_MCx_DESTAT(m->bank);
> -	m->status = mce_rdmsrl(*status_reg);
> +	status_reg = MSR_AMD64_SMCA_MCx_DESTAT(m->bank);
> +	m->status = mce_rdmsrl(status_reg);
>  
>  	if (!(m->status & MCI_STATUS_VAL))
>  		return false;
> @@ -695,6 +695,8 @@ static bool smca_log_poll_error(struct mce *m, u32 *status_reg)
>  	if (m->status & MCI_STATUS_ADDRV)
>  		m->addr = mce_rdmsrl(MSR_AMD64_SMCA_MCx_DEADDR(m->bank));
>  
> +	mce_wrmsrl(status_reg, 0);
> +

I had to think on this for a while. The reason to clear the status
register at the very end is to make sure another error doesn't come in
and overwrite all the "aux" registers before we grab them.

***BUT*** the reason we are going down this path is because another
(higher priority) error *did* overwrite everything. And we're trying to
gather any leftover data. So all the "aux" registers are already
out-of-sync.

I don't think we can solve this in software. We'd need all the state
registers to be duplicated in hardware. We have status and address which
seem to be enough.

I'll see if this can be simplified even further.

Thanks,
Yazen

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ