[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z7fTzrl4cYuu5W9Y@agluck-desk3>
Date: Thu, 20 Feb 2025 17:15:58 -0800
From: "Luck, Tony" <tony.luck@...el.com>
To: Qiuxu Zhuo <qiuxu.zhuo@...el.com>
Cc: Borislav Petkov <bp@...en8.de>, James Morse <james.morse@....com>,
Mauro Carvalho Chehab <mchehab@...nel.org>,
Robert Richter <rric@...nel.org>,
Kevin Chang <kevin1.chang@...el.com>,
Thomas Chen <Thomas.Chen@...el.com>, linux-edac@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/1] EDAC/{skx_common,i10nm}: Fix some missing error
reports on Emerald Rapids
On Fri, Feb 14, 2025 at 08:27:28AM +0800, Qiuxu Zhuo wrote:
> When doing error injection to some memory DIMMs on certain Intel Emerald
> Rapids servers, the i10nm_edac missed error reports for some memory DIMMs.
>
> Certain BIOS configurations may hide some memory controllers, and the
> i10nm_edac doesn't enumerate these hidden memory controllers. However, the
> ADXL decodes memory errors using memory controller physical indices even
> if there are hidden memory controllers. Therefore, the memory controller
> physical indices reported by the ADXL may mismatch the logical indices
> enumerated by the i10nm_edac, resulting in missed error reports for some
> memory DIMMs.
>
> Fix this issue by creating a mapping table from memory controller physical
> indices (used by the ADXL) to logical indices (used by the i10nm_edac) and
> using it to convert the physical indices to the logical indices during the
> error handling process.
>
> Fixes: c545f5e41225 ("EDAC/i10nm: Skip the absent memory controllers")
> Reported-by: Kevin Chang <kevin1.chang@...el.com>
> Tested-by: Kevin Chang <kevin1.chang@...el.com>
> Reported-by: Thomas Chen <Thomas.Chen@...el.com>
> Tested-by: Thomas Chen <Thomas.Chen@...el.com>
> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@...el.com>
Applied to RAS tree edac-drivers branch
Thanks
-Tony
Powered by blists - more mailing lists