[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190322173436.GK12472@zn.tnic>
Date: Fri, 22 Mar 2019 18:34:36 +0100
From: Borislav Petkov <bp@...en8.de>
To: "Ghannam, Yazen" <Yazen.Ghannam@....com>
Cc: "linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"tony.luck@...el.com" <tony.luck@...el.com>,
"x86@...nel.org" <x86@...nel.org>,
"rafal@...ecki.pl" <rafal@...ecki.pl>,
"clemej@...il.com" <clemej@...il.com>
Subject: Re: [PATCH v2 2/2] x86/MCE/AMD: Don't report L1 BTB MCA errors on
some Family 17h models
On Thu, Mar 21, 2019 at 08:25:18PM +0000, Ghannam, Yazen wrote:
> From: Yazen Ghannam <yazen.ghannam@....com>
>
> AMD Family 17h Models 10h-2Fh may report a high number of L1 BTB MCA
> errors under certain conditions. The errors are benign and can safely be
> ignored. However, the high error rate may cause the MCA threshold
> counter to overflow causing a high rate of thresholding interrupts. In
> addition, users may see the errors reported through the AMD MCE decoder
> module, even with the interrupt disabled, due to MCA polling.
>
> This error is reported through the Instruction Fetch bank.
>
> Clear the "Counter Present" bit in the Instruction Fetch bank's
> MCA_MISC0 register. This will prevent enabling MCA thresholding on this
> bank which will prevent the high interrupt rate due to this error.
>
> Define a function to filter these errors from the MCE event pool.
> Install this function during AMD vendor init. The MCA banks are enabled
> after vendor init, so the filter function will be installed before the
> spurious errors will be reported.
>
> Cc: <stable@...r.kernel.org> # 4.14.x: c95b323dcd35: x86/MCE/AMD: Turn off MC4_MISC thresholding on all family 0x15 models
> Cc: <stable@...r.kernel.org> # 4.14.x: 30aa3d26edb0: x86/MCE/AMD: Carve out the MC4_MISC thresholding quirk
> Cc: <stable@...r.kernel.org> # 4.14.x
> Signed-off-by: Yazen Ghannam <yazen.ghannam@....com>
> ---
> Link:
> https://lkml.kernel.org/r/20190307212552.8865-2-Yazen.Ghannam@amd.com
>
> v1->v2:
> * Filter out the error earlier in MCE code rather than later in EDAC.
>
> arch/x86/kernel/cpu/mce/amd.c | 57 ++++++++++++++++++++++++++++-------
> 1 file changed, 46 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
> index e64de5149e50..2db85f65b41e 100644
> --- a/arch/x86/kernel/cpu/mce/amd.c
> +++ b/arch/x86/kernel/cpu/mce/amd.c
> @@ -563,22 +563,55 @@ prepare_threshold_block(unsigned int bank, unsigned int block, u32 addr,
> return offset;
> }
>
> +bool filter_mce_rv(struct mce *m)
> +{
> + enum smca_bank_types bank_type = smca_get_bank_type(m->bank);
> + u8 xec = (m->status >> 16) & 0x3F;
> +
> + /*
> + * Spurious errors of this type may be reported.
> + * See Family 17h Models 10h-2Fh Erratum #1114.
> + */
> + if (bank_type == SMCA_IF && xec == 10)
> + return true;
> +
> + return false;
> +}
> +
> +static void filter_mce_check(struct cpuinfo_x86 *c)
> +{
> + if (c->x86 == 0x17 && (c->x86_model >= 0x10 && c->x86_model <= 0x2F))
> + filter_mce = filter_mce_rv;
> +}
Why all the noodling here with a check function which assigns a
filter_mce_rv (btw, that "rv" means nothing outside of AMD) and a
generic default_filter_mce?
Why not a simple filter_mce() in mce/core.c which calls amd_filter_mce()
based on x86_vendor and amd_filter_mce() is defined in mce/amd.c?
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
Powered by blists - more mailing lists