lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 25 Sep 2020 16:06:20 -0700
From:   "Luck, Tony" <tony.luck@...el.com>
To:     Borislav Petkov <bp@...en8.de>
Cc:     "Song, Youquan" <youquan.song@...el.com>,
        "x86@...nel.org" <x86@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] x86/mce: Add Skylake quirk for patrol scrub reported
 errors

On Fri, Sep 25, 2020 at 09:19:12PM +0200, Borislav Petkov wrote:
> And after staring at this a bit, it looks like all it wants to do is to
> adjust the severity. And we have a severity grading mechanism. So let's
> see how ugly it would become if we extended it to check that too.
> 
> So how's that below instead?

In some ways that's pretty neat. But it would still be ugly if we need
to extend it further for other issues. Especially if they don't have such
a simple rule to adjust the severity.

> It builds here, I haven't even thought about testing it and I might've
> missed out on some aspects but tbh this looks much better to me. Because
> it is not bolted on the handling path but integral part of it.
> 
> Thoughts?
> 
> ---
> diff --git a/arch/x86/kernel/cpu/mce/severity.c b/arch/x86/kernel/cpu/mce/severity.c
> index e1da619add19..8c1a41aa5e40 100644
> --- a/arch/x86/kernel/cpu/mce/severity.c
> +++ b/arch/x86/kernel/cpu/mce/severity.c
> @@ -9,9 +9,11 @@
>  #include <linux/seq_file.h>
>  #include <linux/init.h>
>  #include <linux/debugfs.h>
> -#include <asm/mce.h>
>  #include <linux/uaccess.h>
>  
> +#include <asm/mce.h>
> +#include <asm/intel-family.h>
> +
>  #include "internal.h"
>  
>  /*
> @@ -40,9 +42,14 @@ static struct severity {
>  	unsigned char context;
>  	unsigned char excp;
>  	unsigned char covered;
> +	unsigned char cpu_model;
> +	unsigned char cpu_stepping;
> +	unsigned char bank_lo, bank_hi;

This would be better as a bit mask. I don't think we need this same
hack on the next generation of CPUs ... but if we did, the bank numbers
that would be affected don't form a continuous sequence.

>  	char *msg;
>  } severities[] = {
>  #define MCESEV(s, m, c...) { .sev = MCE_ ## s ## _SEVERITY, .msg = m, ## c }
> +#define BANK_RANGE(l, h) .bank_lo = l, .bank_hi = h
> +#define MODEL_STEPPING(m,s) .cpu_model = m, .cpu_stepping = s
>  #define  KERNEL		.context = IN_KERNEL
>  #define  USER		.context = IN_USER
>  #define  KERNEL_RECOV	.context = IN_KERNEL_RECOV
> @@ -97,7 +104,10 @@ static struct severity {
>  		KEEP, "Corrected error",
>  		NOSER, BITCLR(MCI_STATUS_UC)
>  		),
> -
> +	MCESEV(AO, "UnCorrected Patrol Scrub Error",
> +		NOSER, MASK(0xffffeff0, 0x001000c0),
> +		MODEL_STEPPING(INTEL_FAM6_SKYLAKE_X, 4),BANK_RANGE(13,18)
> +	),

I'd need to stare at the placement of this in the sequence of rules at some
non-Friday-afternoon time. It might be right, but as we've grumbled together
many times before that code is full of surprise side effects.

>  	/*
>  	 * known AO MCACODs reported via MCE or CMC:
>  	 *
> @@ -324,6 +334,12 @@ static int mce_severity_intel(struct mce *m, int tolerant, char **msg, bool is_e
>  			continue;
>  		if (s->excp && excp != s->excp)
>  			continue;
> +		if (s->cpu_model && boot_cpu_data.x86_model != s->cpu_model)
> +			continue;
> +		if (s->cpu_stepping && boot_cpu_data.x86_stepping <= s->cpu_stepping)
> +			continue;
> +		if (s->bank_lo && (s->bank_lo <= m->bank && m->bank <= s->bank_hi))
> +			continue;
>  		if (msg)
>  			*msg = s->msg;
>  		s->covered = 1;

-Tony

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ