[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170322211329.GE20697@nazgul.tnic>
Date: Wed, 22 Mar 2017 22:13:29 +0100
From: Borislav Petkov <bp@...en8.de>
To: Yazen Ghannam <Yazen.Ghannam@....com>
Cc: linux-edac@...r.kernel.org, Tony Luck <tony.luck@...el.com>,
x86@...nel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 4/4] x86/mce: Add AMD SMCA support to SRAO notifier
On Mon, Mar 20, 2017 at 03:26:54PM -0500, Yazen Ghannam wrote:
> From: Yazen Ghannam <yazen.ghannam@....com>
>
> Deferred errors on AMD systems may get an Action Optional severity with the
> goal of being handled by the SRAO notifier block. However, the process of
> determining if an address is usable is different between Intel and AMD. So
> define vendor-specific functions for this.
>
> Also, check for the AO severity before determining if an address is usable
> to possibly save some cycles.
>
> Signed-off-by: Yazen Ghannam <yazen.ghannam@....com>
> ---
> Link: http://lkml.kernel.org/r/1486760120-60944-3-git-send-email-Yazen.Ghannam@amd.com
>
> v1->v2:
> - New in v2. Based on v1 patch 3.
> - Update SRAO notifier block to handle errors from SMCA systems.
>
> arch/x86/kernel/cpu/mcheck/mce.c | 52 ++++++++++++++++++++++++++++++----------
> 1 file changed, 40 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 5e365a2..1a2669d 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -547,19 +547,49 @@ static void mce_report_event(struct pt_regs *regs)
> * be somewhat complicated (e.g. segment offset would require an instruction
> * parser). So only support physical addresses up to page granuality for now.
> */
> -static int mce_usable_address(struct mce *m)
> +static int mce_usable_address_intel(struct mce *m, unsigned long *pfn)
So this function is basically saying whether the address is usable but
then it is also returning it if so.
And then it is using an I/O argument. Yuck.
So it sounds to me like this functionality needs redesign: something
like have a get_usable_address() function (the "mce_" prefix is not
really needed as it is static) which returns an invalid value when it
determines that it doesn't have a usable address and the address itself
if it succeeds.
> {
> - if (!(m->status & MCI_STATUS_MISCV) || !(m->status & MCI_STATUS_ADDRV))
> + if (!(m->status & MCI_STATUS_MISCV))
> return 0;
> -
> - /* Checks after this one are Intel-specific: */
> - if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
> - return 1;
> -
> if (MCI_MISC_ADDR_LSB(m->misc) > PAGE_SHIFT)
> return 0;
> if (MCI_MISC_ADDR_MODE(m->misc) != MCI_MISC_ADDR_PHYS)
> return 0;
> +
> + *pfn = m->addr >> PAGE_SHIFT;
> + return 1;
> +}
> +
> +/* Only support this on SMCA systems and errors logged from a UMC. */
> +static int mce_usable_address_amd(struct mce *m, unsigned long *pfn)
> +{
> + u8 umc;
> + u16 nid = cpu_to_node(m->extcpu);
> + u64 addr;
> +
> + if (!mce_flags.smca)
> + return 0;
So on !SMCA systems there'll be no usable address ever! Even with
MCI_STATUS_ADDRV set.
Please *test* your stuff on all affected hardware before submitting.
> +
> + umc = find_umc_channel(m);
> +
> + if (umc < 0 || umc_normaddr_to_sysaddr(m->addr, nid, umc, &addr))
> + return 0;
> +
> + *pfn = addr >> PAGE_SHIFT;
> + return 1;
> +}
> +
> +static int mce_usable_address(struct mce *m, unsigned long *pfn)
> +{
> + if (!(m->status & MCI_STATUS_ADDRV))
> + return 0;
What happened to the MCI_STATUS_MISCV bit check?
> + if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
> + return mce_usable_address_intel(m, pfn);
> +
> + if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
> + return mce_usable_address_amd(m, pfn);
> +
> return 1;
We definitely don't want to say that the address is usable on a third
vendor. It would be most likely a lie even if we never reach this code
on a third vendor.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
Powered by blists - more mailing lists