[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f629820c-50cf-7366-975e-68215b3f2bc5@amd.com>
Date: Tue, 9 May 2023 10:25:09 -0400
From: Yazen Ghannam <yazen.ghannam@....com>
To: Shuai Xue <xueshuai@...ux.alibaba.com>, bp@...en8.de,
tony.luck@...el.com
Cc: yazen.ghannam@....com, tglx@...utronix.de, mingo@...hat.com,
dave.hansen@...ux.intel.com, x86@...nel.org, hpa@...or.com,
baolin.wang@...ux.alibaba.com, linux-edac@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] x86/mce/amd: init mce severity to handle deferred memory
failure
On 4/25/23 8:18 AM, Shuai Xue wrote:
> When a deferred UE error is detected, e.g by background patrol scruber, it
> will be handled in APIC interrupt handler amd_deferred_error_interrupt().
> The handler will collect MCA banks, init mce struct and process it by
> nofitying the registered MCE decode chain.
>
> The uc_decode_notifier, one of MCE decode chain, will process memory
> failure but only limit to MCE_AO_SEVERITY and MCE_DEFERRED_SEVERITY.
> However, APIC interrupt handler does not init mce severity and the
> uninitialized severity is 0 (MCE_NO_SEVERITY).
>
> To handle the deferred memory failure case, init mce severity when logging
> MCA banks.
>
> Signed-off-by: Shuai Xue <xueshuai@...ux.alibaba.com>
>
Hi Shuai Xue,
I think this patch is fair to do. But it won't have the intended effect
in practice.
The value in MCA_ADDR for DRAM ECC errors will be a memory controller
"normalized address". This is not a system physical address that the OS
can use to take action.
The mce_usable_address() function needs to be updated to handle this.
I'll send a patchset this week to do so. Afterwards, the
uc_decode_notifier will not attempt to handle these errors.
Thanks,
Yazen
Powered by blists - more mailing lists