lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 22 Mar 2019 18:34:36 +0100
From:   Borislav Petkov <bp@...en8.de>
To:     "Ghannam, Yazen" <Yazen.Ghannam@....com>
Cc:     "linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "tony.luck@...el.com" <tony.luck@...el.com>,
        "x86@...nel.org" <x86@...nel.org>,
        "rafal@...ecki.pl" <rafal@...ecki.pl>,
        "clemej@...il.com" <clemej@...il.com>
Subject: Re: [PATCH v2 2/2] x86/MCE/AMD: Don't report L1 BTB MCA errors on
 some Family 17h models

On Thu, Mar 21, 2019 at 08:25:18PM +0000, Ghannam, Yazen wrote:
> From: Yazen Ghannam <yazen.ghannam@....com>
> 
> AMD Family 17h Models 10h-2Fh may report a high number of L1 BTB MCA
> errors under certain conditions. The errors are benign and can safely be
> ignored. However, the high error rate may cause the MCA threshold
> counter to overflow causing a high rate of thresholding interrupts. In
> addition, users may see the errors reported through the AMD MCE decoder
> module, even with the interrupt disabled, due to MCA polling.
> 
> This error is reported through the Instruction Fetch bank.
> 
> Clear the "Counter Present" bit in the Instruction Fetch bank's
> MCA_MISC0 register. This will prevent enabling MCA thresholding on this
> bank which will prevent the high interrupt rate due to this error.
> 
> Define a function to filter these errors from the MCE event pool.
> Install this function during AMD vendor init. The MCA banks are enabled
> after vendor init, so the filter function will be installed before the
> spurious errors will be reported.
> 
> Cc: <stable@...r.kernel.org> # 4.14.x: c95b323dcd35: x86/MCE/AMD: Turn off MC4_MISC thresholding on all family 0x15 models
> Cc: <stable@...r.kernel.org> # 4.14.x: 30aa3d26edb0: x86/MCE/AMD: Carve out the MC4_MISC thresholding quirk
> Cc: <stable@...r.kernel.org> # 4.14.x
> Signed-off-by: Yazen Ghannam <yazen.ghannam@....com>
> ---
> Link:
> https://lkml.kernel.org/r/20190307212552.8865-2-Yazen.Ghannam@amd.com
> 
> v1->v2:
> * Filter out the error earlier in MCE code rather than later in EDAC.
> 
>  arch/x86/kernel/cpu/mce/amd.c | 57 ++++++++++++++++++++++++++++-------
>  1 file changed, 46 insertions(+), 11 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
> index e64de5149e50..2db85f65b41e 100644
> --- a/arch/x86/kernel/cpu/mce/amd.c
> +++ b/arch/x86/kernel/cpu/mce/amd.c
> @@ -563,22 +563,55 @@ prepare_threshold_block(unsigned int bank, unsigned int block, u32 addr,
>  	return offset;
>  }
>  
> +bool filter_mce_rv(struct mce *m)
> +{
> +	enum smca_bank_types bank_type = smca_get_bank_type(m->bank);
> +	u8 xec = (m->status >> 16) & 0x3F;
> +
> +	/*
> +	 * Spurious errors of this type may be reported.
> +	 * See Family 17h Models 10h-2Fh Erratum #1114.
> +	 */
> +	if (bank_type == SMCA_IF && xec == 10)
> +		return true;
> +
> +	return false;
> +}
> +
> +static void filter_mce_check(struct cpuinfo_x86 *c)
> +{
> +	if (c->x86 == 0x17 && (c->x86_model >= 0x10 && c->x86_model <= 0x2F))
> +		filter_mce = filter_mce_rv;
> +}

Why all the noodling here with a check function which assigns a
filter_mce_rv (btw, that "rv" means nothing outside of AMD) and a
generic default_filter_mce?

Why not a simple filter_mce() in mce/core.c which calls amd_filter_mce()
based on x86_vendor and amd_filter_mce() is defined in mce/amd.c?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Powered by blists - more mailing lists