lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200616192952.GO13515@zn.tnic>
Date:   Tue, 16 Jun 2020 21:29:52 +0200
From:   Borislav Petkov <bp@...en8.de>
To:     Tony Luck <tony.luck@...el.com>
Cc:     Youquan Song <youquan.song@...el.com>, x86@...nel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] x86/mce: Add Skylake quirk for patrol scrub reported
 errors

On Mon, Jun 15, 2020 at 11:40:56AM -0700, Tony Luck wrote:
> From: Youquan Song <youquan.song@...el.com>
> 
> Skylake has a mode where the system administrator can use a BIOS setup
> option to request that the memory controller report uncorrected errors
> found by the patrol scrubber as corrected.  This results in them being
> signalled using CMCI, which is less disruptive than a machine check.
> 
> Add a quirk to detect that a "corrected" error is actually a downgraded
> uncorrected error with model specific checks for the "MSCOD" signature in
> MCi_STATUS and that the error was reported from a memory controller bank.
> 
> Adjust the severity to MCE_AO_SEVERITY so that Linux will try to take
> the affected page offline.
> 
> [Tony: Wordsmith commit comment]
> 
> Signed-off-by: Youquan Song <youquan.song@...el.com>
> Signed-off-by: Tony Luck <tony.luck@...el.com>
> ---
>  arch/x86/kernel/cpu/mce/core.c | 30 ++++++++++++++++++++++++++++++
>  1 file changed, 30 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index e9265e2f28c9..0dbd0a21a0bf 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -123,6 +123,8 @@ static struct irq_work mce_irq_work;
>  
>  static void (*quirk_no_way_out)(int bank, struct mce *m, struct pt_regs *regs);
>  
> +static void no_adjust_mce_log(struct mce *m) {};
> +static void (*adjust_mce_log)(struct mce *m) = no_adjust_mce_log;
>  /*
>   * CPU/chipset specific EDAC code can register a notifier call here to print
>   * MCE errors in a human-readable form.
> @@ -772,6 +774,7 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
>  		if (mca_cfg.dont_log_ce && !mce_usable_address(&m))
>  			goto clear_it;
>  
> +		adjust_mce_log(&m);
>  		mce_log(&m);

Two things: can that error type be detected when #MC gets raised, i.e., in
do_machine_check() as part of scanning all banks?

If so, then the adjusting needs to happen inside mce_log().

Also, that assignment to the function pointer doesn't make much sense to
me and I think you should do the vendor/family/model checking straight
in a function adjust_mce_log() which gets called by whoever...

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ