[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <SN6PR12MB26394CD4E1BAC068B0B1AEF6F80A0@SN6PR12MB2639.namprd12.prod.outlook.com>
Date: Thu, 16 May 2019 20:20:58 +0000
From: "Ghannam, Yazen" <Yazen.Ghannam@....com>
To: Borislav Petkov <bp@...en8.de>
CC: "Luck, Tony" <tony.luck@...el.com>,
"linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"x86@...nel.org" <x86@...nel.org>
Subject: RE: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in
hardware
> -----Original Message-----
> From: linux-edac-owner@...r.kernel.org <linux-edac-owner@...r.kernel.org> On Behalf Of Borislav Petkov
> Sent: Thursday, May 16, 2019 12:21 PM
> To: Ghannam, Yazen <Yazen.Ghannam@....com>
> Cc: Luck, Tony <tony.luck@...el.com>; linux-edac@...r.kernel.org; linux-kernel@...r.kernel.org; x86@...nel.org
> Subject: Re: [PATCH v3 5/6] x86/MCE: Save MCA control bits that get set in hardware
>
>
> On Thu, May 16, 2019 at 05:09:11PM +0000, Ghannam, Yazen wrote:
> > So that the sysfs files show the control values that are set in the
> > hardware. It seemed like this would be more helpful than showing all
> > 0xF's.
>
> Yeah, but it has been like that since forever and it hasn't bugged
> anybody. Probably because anybody doesn't even look at those files. As
> Tony says:
>
> "RAS is a lonely subsystem ... even EDAC gets more love."
>
> :-)))
>
> And adding yet another vendor check for this seemed just not worth it.
>
> > Should I send out another version of this set?
>
> I simply zapped 5/6. I still think your 6/6 makes sense though.
>
> ---
> From: Yazen Ghannam <yazen.ghannam@....com>
> Date: Tue, 30 Apr 2019 20:32:21 +0000
> Subject: [PATCH] x86/MCE: Determine MCA banks' init state properly
>
> The OS is expected to write all bits to MCA_CTL for each bank,
> thus enabling error reporting in all banks. However, some banks
> may be unused in which case the registers for such banks are
> Read-as-Zero/Writes-Ignored. Also, the OS may avoid setting some control
> bits because of quirks, etc.
>
> A bank can be considered uninitialized if the MCA_CTL register returns
> zero. This is because either the OS did not write anything or because
> the hardware is enforcing RAZ/WI for the bank.
>
> Set a bank's init value based on if the control bits are set or not in
> hardware. Return an error code in the sysfs interface for uninitialized
> banks.
>
> [ bp: Massage a bit. ]
>
> Signed-off-by: Yazen Ghannam <yazen.ghannam@....com>
> Signed-off-by: Borislav Petkov <bp@...e.de>
> Cc: "H. Peter Anvin" <hpa@...or.com>
> Cc: Ingo Molnar <mingo@...hat.com>
> Cc: "linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>
> Cc: Thomas Gleixner <tglx@...utronix.de>
> Cc: Tony Luck <tony.luck@...el.com>
> Cc: "x86@...nel.org" <x86@...nel.org>
> Link: https://lkml.kernel.org/r/20190430203206.104163-7-Yazen.Ghannam@amd.com
> ---
> arch/x86/kernel/cpu/mce/core.c | 17 +++++++++++++----
> 1 file changed, 13 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 5bcecadcf4d9..c049689f3d73 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -1567,10 +1567,13 @@ static void __mcheck_cpu_init_clear_banks(void)
> for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
> struct mce_bank *b = &mce_banks[i];
>
> - if (!b->init)
> - continue;
> - wrmsrl(msr_ops.ctl(i), b->ctl);
> - wrmsrl(msr_ops.status(i), 0);
> + if (b->init) {
> + wrmsrl(msr_ops.ctl(i), b->ctl);
> + wrmsrl(msr_ops.status(i), 0);
> + }
> +
> + /* Bank is initialized if bits are set in hardware. */
> + b->init = !!b->ctl;
We don't actually know if there are bits set in hardware until we read it back. So I don't think this is adding anything new.
Thanks,
Yazen
Powered by blists - more mailing lists