[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <PH7SPRMB0012094F02A587B859AEFDF282769@PH7SPRMB0012.NAMPRD84.PROD.OUTLOOK.COM>
Date: Mon, 29 Aug 2022 21:37:56 +0000
From: "Kani, Toshi" <toshi.kani@....com>
To: Borislav Petkov <bp@...en8.de>,
Yazen Ghannam <yazen.ghannam@....com>,
"Elliott, Robert (Servers)" <elliott@....com>
CC: Jia He <justin.he@....com>, Len Brown <lenb@...nel.org>,
James Morse <james.morse@....com>,
Tony Luck <tony.luck@...el.com>,
Mauro Carvalho Chehab <mchehab@...nel.org>,
Robert Richter <rric@...nel.org>,
Robert Moore <robert.moore@...el.com>,
Qiuxu Zhuo <qiuxu.zhuo@...el.com>,
Jonathan Corbet <corbet@....net>,
Jan Luebbe <jlu@...gutronix.de>,
Khuong Dinh <khuong@...amperecomputing.com>,
Ard Biesheuvel <ardb@...nel.org>,
"linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-edac@...r.kernel.org" <linux-edac@...r.kernel.org>,
"devel@...ica.org" <devel@...ica.org>,
"Rafael J . Wysocki" <rafael@...nel.org>,
Shuai Xue <xueshuai@...ux.alibaba.com>,
Jarkko Sakkinen <jarkko@...nel.org>,
"linux-efi@...r.kernel.org" <linux-efi@...r.kernel.org>,
"nd@....com" <nd@....com>, "Paul E. McKenney" <paulmck@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Neeraj Upadhyay <quic_neeraju@...cinc.com>,
Randy Dunlap <rdunlap@...radead.org>,
Damien Le Moal <damien.lemoal@...nsource.wdc.com>,
Muchun Song <songmuchun@...edance.com>,
"linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
"stable@...nel.org" <stable@...nel.org>
Subject: RE: [RESEND PATCH v3 3/9] EDAC/ghes: Make ghes_edac a proper module
to remove the dependency on ghes
On Monday, August 29, 2022 2:39 PM, Borislav Petkov wrote:
> On Mon, Aug 29, 2022 at 03:59:28PM +0000, Yazen Ghannam wrote:
> > GHES can be used for more than just memory errors. There are platforms where
> > memory errors are handled through the OS MCA, and PCIe AER errors are handled
> > through the FW, for example.
> >
> > Is the HPE Server platform guaranteed to always provide memory errors through
> > GHES regardless of CPU vendor/architecture?
>
> /me looks in the direction of HPE folks...
The HPE platforms enabled by the platform check are guaranteed to be operating
in FW First mode, which FW decides which error to report to the OS via GHES or
other means. This may include multiple CPU vendors/architecture.
On such platforms, for instance, FW does not report corrected errors to the OS
since FW manages the threshold & FRU notification. Chipset-specific edac drivers,
designed for OS First mode, is not necessary on such platforms. Disabling such OS
First edac driver is achieved by enabling ghes_edac as well.
OS MCA is still used for uncorrected errors, such as SRAR (software recoverable
action required) which requires recovery action synchronous to the execution via
MCE signalling.
Toshi
Powered by blists - more mailing lists